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(54) Tide: MUTANT AEQUOREA VICTORIA FLUORESCENT PROTEINS HAVING INCREASED CELLULAR FLUORESCENCE 
(57) Abstract 

The present invention is directed to mutants of the jellyfish Aequorea victoria green fluorescent protein (GFP) having at least 5 and 
preferably greater than 20 Umes the specific green fluorescence of the wild type protein. In other embodiments, the invention comprises 
mutant blue fluorescent proteins (BFPs) that emit an enhanced blue fluorescence. Hie invention also encompasses the expression of 
nucleic acids that encode a mutant GFP or BFP in a wide variety of engineered host cells, and the isolation of engineered proteins having 
increased fluorescent activity. Hie novel mutants of the present invention allow for a significantly more sensitive detection of fluorescence 
in engineered host cells than is possible with GFP or with its known mutants. Thus, the mutant fluorescent proteins provided herein can be 
used as sensitive reporter molecules to detect the cell and tissue-specific expression and subcellular compartmentalization of GFP or BFP 
mutants, or of chimenc proteins comprising GFP or BFP mutants fused to a regulatory sequence or to a second protein sequence 



FOR THE PURPOSES OF INFORMATION ONLY 



Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


Ixsotho 


SI 


Slovenia 


AM 


Armenia 


FI 


Finland 


LT 


Lithuania 


SK 


Slovakia 


AT 


Austria 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


sz 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


BB 


Barbados 


GH 


Ghana 


MG 


Madagascar 


TJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


Turkmenistan 


BF 


Burkina Faso 


GR 


Greece 




Republic of Macedonia 


TR 


Turkey 


BG 


Bulgaria 


HU 


Hungary 


ML 


Mali 


TT 


Trinidad and Tobago 


BJ 


Benin 


IE 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BR 


Brazil 


IL 


Israel 


MR 


Mauritania 


UG 


Uganda 


BY 


Belarus 


IS 


Iceland 


MW 


Malawi 


US 


United States of America 


CA 


Canada 


IT 


Italy 


MX 


Mexico 


uz 


Uzbekistan 


CF 


Central African Republic 


JP 


Japan 


NE 


Niger 


VN 


Viet Nam 


CC 


Congo 


KE 


Kenya 


NL 


Netherlands 


YU 


Yugoslavia 


CH 


Switzerland 


KG 


Kyrgyzstan 


NO 


Norway 


ZW 


Zimbabwe 


ci 


C6ie d'l voire 


KP 


Democratic People's 


NZ 


New Zealand 






CM 


Cameroon 




Republic of Korea 


PL 


Poland 






CN 


China 


KR 


Republic of Korea 


FT 


Portugal 






cv 


Cuba 


KZ 


Kazakstan 


RO 


Romania 






cz 


Czech Republic 


LC 


Saint Lucia 


RU 


Russian Federation 






DE 


Germany 


U 


Liechtenstein 


SD 


Sudan 






DK 


Denmark 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LR 


Liberia 


SG 


Singapore 







WO 97/42320 



PCT/US97/07625 



1 



MUTANT AEQUOREA VICTORIA FLUORESCENT PROTEINS 
HAVING INCREASED CELLULAR FLUORESCENCE 

FIELD OP THE INVENTION 

This invention generally relates to novel proteins 
and their production which are useful for detecting gene 
expression and for visualizing the subcellular targeting and 
distribution of selected proteins and peptides, among other 
things. The invention specifically relates to mutations in 
the gene coding for the jellyfish Aequorea victoria green 
fluorescent protein ( "GFP" ) , which mutations encode mutant GFP 
proteins having either an enhanced green or a blue 
fluorescence, and uses for them. 

BACKGROUND OF THE INVENTION 

Green fluorescent protein ("GFP") is a monomeric 
protein of about 27 kDa which can be isolated from the 
bioluminescent jellyfish Aequorea victoria. When wild type 
GFP is illuminated by blue or ultraviolet light, it emits a 
brilliant green fluorescence. Similar to fluorescein 
isothiocyanate, GFP absorbs ultraviolet and blue light with a 
maximum absorbance at 3 95 nm and a minor peak of absorbance at 
470 nm, and emits green light with a maximum emission at 509 
nm with a minor peak at 540 nm. GFP fluorescence persists 
even after fixation with formaldehyde, and it is more stable 
to photobleaching than fluorescein. 

The gene for GFP has been isolated and sequenced. 
Prasher, D. C. et al . (1992), "Primary structure of the 
Aequorea victoria green fluorescent protein," Gene 111:229- 
233. Expression vectors that comprise the GFP gene or cDNA 
have been introduced into a variety of host cells. These host 
cells include: Chinese hamster ovary (CHO) cells, human 
embryonic kidney cells (HEK293), COS-1 monkey cells, myeloma 
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cells, NIH 3T3 mouse fibroblasts, PtKl cells, BHK cells, PC12 
cells, Xenopus, leech, transgenic zebra fish, transgenic mice, 
Drosophila and several plants. The GFP molecules expressed by 
these different cells have a similar fluorescence as the 
5 native molecules, demonstrating that the GFP fluorescence does 
not require any species- specif ic cof actors or substrates. 
See, e.g., Baulcombe, D. et al . (1995), "Jellyfish green 
fluorescent protein as a reporter for virus infections," The 
Plant Journal 7:1045-1053; Chalfie, M. et al . (1994), "Green 

10 fluorescent protein as a marker for gene expression," Science 
263:802-805; Inouye, S. & Tsuji, F. (1994), "Aequorea green 
fluorescent protein: expression of the gene and fluorescent 
characteristics of the recombinant protein, " FEBS Letters 
341:277-280; Inouye, S. & Tsuji, F. (1994), "Evidence for 

15 redox forms of the Aequorea green fluorescent protein," FEBS 
Letters 351:211-214; Kain, S. et al. (1995), "The green 
fluorescent protein as a reporter of gene expression and 
protein localization," BioTechniques (in press); Kitts, P. et 
ai. (1995), "Green Fluorescent Protein (GFP): A novel reporter 

20 for monitoring gene expression in living organisms, " 

CLONTECHnlques X(l) : 1-3; Lo, D. et al. (1994), "Neuronal 
transfection in brain slices using particle-mediated gene 
transfer," Neuron 13:1263-1268; Moss, J. B. & Rosenthal, N. 
(1994), "Analysis of gene expression patterns in the embryonic 

25 mouse myotome with the green fluorescent protein, a new vital 
marker," J. Cell. Biochem. , Supplement 18D W161; Niedz, R . et 
ai. (1995), "Green fluorescent protein: an in vivo reporter of 
plant gene expression," Plant Cell Reports 14:403-406; Wu, 
G.-I. et al. (1995), "Infection of frog neurons with vaccinia 

30 virus permits in vivo expression of foreign proteins," Neuron 
14:681-684; Yu, J. & van den Engh, G. (1995), "Flow-sort and 
growth of single bacterial cells transformed with cosmid and 
plasmid vectors that include the gene for green- fluorescent 
protein as a visible marker, " Abstracts of papers presented at 

35 the 1995 meeting on "Genome Mapping and Sequencing," Cold 
Spring Harbor, p. 293. 

The active GFP chromophore is a hexapeptide which 
contains a cyclized Ser-dehydroTyr-gly trimer at positions 65- 
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67. This chromophore is only fluorescent when embedded 
within the intact GFP protein. Chromophore formation occurs 
post-translationally; nascent GFP is not fluorescent. The 
chromophore is thought to be formed by a cyclization reaction 
and an oxidation step that requires molecular oxygen. 

Proteins can be fused to the amino (N-) or carboxy 
(C-) terminus of GFP. Such fused proteins have been shown to 
retain the fluorescent properties of GFP and the functional 
properties of the fusion partner. Bian, J. et al . (1995), 
"Nuclear localization of HIV-1 matrix protein P17: The use of 
A. victoria GFP in protein tagging and tracing," FASEB J. 
9:AI279; Flach, J. et al. (1994), "A yeast RNA- binding 
protein shuttles between the nucleus and the cytoplasm," Mol. 
Cell. Biol. 14:8399-8407; Marshall, J. etal. (1995), "The 
jellyfish green fluorescent protein: a new tool for studying 
ion channel expression and function," Neuron 14:211-215; 
Olmsted, J. et al . (1994), "Green Fluorescent Protein (GFP) 
chimeras as reporters for MAP4 behavior in living cells," Mol . 
Biol, of the Cell 5:167a; Rizzuto, R. et al . (1995), "Chimeric 
green fluorescent protein as a tool for visualizing 
subcellular organelles in living cells," Current Biol. 
5:635-642; Sengupta, P. et al. (1994), "The C. elegans gene 
odr-7 encodes an olfactory-specific member of the nuclear 
receptor superfamily, " Cell 79:971-980; Stearns, T. (1995), 
"The green revolution," Current Biol. 5:262-264; Treinin, M. & 
Chalfie, M. (1995), "A mutated acetylcholine receptor subunit 
causes neuronal degeneration in C. elegans," Neuron 14:871- 
877; Wang, S. & Hazelrigg, T. (1994), "Implications for bed 
MRNA localization from spatial distribution of exu protein in 
Drosophila oogenesis," Nature 369:400-403. 

A number of GFP mutants have been reported. 
Delagrave, S. et al. (1995) "Red-shifted excitation mutants of 
the green fluorescent protein," Bio/Technology 13:151-154; 
Heim, R. et a J . (1994) "Wavelength mutations and 
posttranslational autoxidation of green fluorescent protein," 
Proc. Natl. Acad. Sci. USA 91:12501-12504; Heim, R . et al . 
(1995), "Improved green fluorescence," Nature 373:663-664. 
Delgrave et al . (1995) Bio/Technology 13:151-154 isolated 
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mutants of cloned Aequorea victoria GFP that had red-shifted 
excitation spectra. Heim, R. et al. (1994) "Wavelength 
mutations and posttranslational autoxidation of green 
fluorescent protein," Proc. Natl. Acad. Sci . USA 91:12501- 
5 12504 reported a mutant (Tyr66 to His) having a blue 

fluorescence, which is herein designated BFP (Tyr 67 -*His) . 
These references have neither taught nor suggested that their 
mutations resulted in an increase in the cellular fluorescence 
of the mutant GFPs. 

10 In general, the level of fluorescence of a protein 

expressed in a cell depends on several factors, such as number 
of copies made of the fluorescent protein, stability of the 
protein, efficiency of formation of the chromophore, and 
interactions with cellular solvents, solutes and structures. 

15 Although the fluorescent signal from wild type GFP or from the 
reported mutants is generally adequate for bulk detection of 
abundantly expressed GFP or of GFP-containing chimeras, it is 
inadequate for detecting transient low or constitutively low 
levels of expression, or for performing fine structural 

20 subcellular localizations. This limitation severely restricts 
the use of native GFP or of the reported mutants as a 
biochemical and structural marker for gene expression and 
morphological studies . 

25 SUMMARY OF THE INVENTION 

It an object of the invention to provide engineered 
GFP-encoding nucleic acid sequences that encode modified GFP 
molecules having a greater cellular fluorescence than wild 
30 type GFP or prior described recombinant GFP. 

It is a further object of this invention to provide 
recombinant vectors containing these modified GFP-encoding 
nucleic acid sequences, which vectors are capable of being 
inserted into a variety of cells (including mammalian and 
35 eukaryotic cells) and expressing the modified GFP. 

It is also an object of this invention to provide 
host cells capable of providing useful quantities of 
homogeneous modified GFP. 
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It is yet another object of this invention to 
provide peptides that possess a greater cellular fluorescence 
than native GFP or unaltered recombinant GFP and that can be 
produced in large quantities in a laboratory, by a 
microorganism or by a cell in culture. 

These and other objects of the invention have been 
accomplished by providing mutant GFP-encoding nucleic acids 
whose gene product exhibits an increased cellular fluorescence 
relative to naturally occurring or recombinantly produced wild 
type GFP ( "wtGFP" ) . In some embodiments, the modified GFPs 
possess fluorescent activity that is 50-100 fold greater than 
that of unmodified GFP. 

The modified proteins of the present invention are 
produced by making mutations in a genetic sequence that 
result in alterations in the amino acid sequence of the 
resulting gene product. Our starting material was a GFP- 
encoding nucleic acid wherein a codon encoding an additional 
nucleic acid was inserted at position 2 of the previously 
published GFP amino acid sequence (Chalfie et al., 1994), to 
introduce a useful restriction site. Due to the amino acid 
insertion at position 2 of the GFP amino acid sequence, our 
numbering of the GFP amino acids and description of the amino 
acid amutations is off by one as compared to the originally 
reported wild type GFP sequence (Prasher et al., 1992). Thus, 
amino acid 65 by our numbering corresponds to amino acid 64 of 
the originally reported wild type GFP, amino acid 168 
corresponds to amino acid 167 of the originally reported wild 
type GFP, etc. 

Using the modified wild type GFP described herein, a 
number of the unique mutants described herein derive from the 
discovery of an unplanned and unexpected mutation called 
H SG12'\ obtained in the course of site-directed mutagenesis 
experiments, wherein a phenylalanine at position 65 of wtGFP 
was converted to leucine. A mutant referred to as "SG11," 
which combined the phenylalanine 65 to leucine alteration with 
an isoleucine 168 to threonine substitution and a lysine 239 
to asparagine substitution, gave a further enhanced 
fluorescence intensity. The lysine 23 9 to asparagine 
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substitution does not affect the fluorescence of GFP; indeed 
the C-terminal lysine or asparagine may be deleted without 
affecting fluorescence. A third and further improved GFP 
mutant was obtained by further mutating "SGll." This mutant 
5 is referred to as "SG25 n and , in addition to the SGll 

mutations, contains an additional mutation, a substitution of 
a cysteine at position 66 for the serine normally found at 
that position in the sequence. 

In addition, the invention encompasses novel GFP 

10 mutants that emit a blue fluorescence. These blue mutants are 
derived from a mutation of the wild type GFP (Heim, R. et al. 
(1994) "Wavelength mutations and posttranslational 
autoxidation of green fluorescent protein," Proc. Natl. Acad. 
Sci. USA 91:12501-12504), in which histidine was substituted 

15 for tyrosine at amino acid position 66. This mutant emits a 

blue fluorescence, i.e., it becomes a Blue Fluorescent Protein 
(BFP) . 

Novel BFP mutants having an enhanced blue 
fluorescence were made by further modifying this 

20 BFP (Tyr 67 -»His) . The introduction of the same mutation used to 
generate SG12, (i.e., phenylalanine to leucine at position 65) 
into BFP (Tyr 67 -»His) resulted in a new mutant having a brighter 
fluorescence, designated M SuperBlue-42 " (SB42) . A second 
independently generated mutation of BFP (Tyr 67 -*His) , in which a 

25 valine at position 164 was converted to alanine, also emitted 
an enhanced blue fluorescent signal and is referred to as 
M SB49." A combination of the above two mutations resulted in 
"SB50 M , which exhibited an even greater fluorescence 
enhancement than either of the previous mutations. 

3 0 The novel GFP and BFP mutants of this invention 

allow for a significantly more sensitive detection of 
fluorescence in host cells than is possible with the wild type 
protein. Accordingly, the mutant GFPs provided herein can be 
used, among other things, as sensitive reporter molecules to 

35 detect the cell and tissue-specific expression and subcellular 
compartmentalization of GFP or of chimeric proteins comprising 
GFP fused to a regulatory sequence or to a second protein 
sequence. In addition, these mutations make possible a 
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variety of one and two color protein assays to quantitate 
expression in mammalian cells. 



DETAILED DESCRIPTION OF THE INVENTION 

The present invention comprises mutant nucleic acids 
that encode engineered GFPs having a greater cellular 
fluorescence than either native GFP or unaltered ("wild type") 
recombinant GFP, and the mutant GFPs themselves. It further 
comprises a subset of mutant GFPs that are mutant blue 
fluorescent proteins ("BFPs") that are derived from a 
published BFP, designated BFP (Tyr 67 -*His) , wherein the mutant 
BFPs have a cellular fluorescence that is at least five times 
greater, preferably ten times greater, and most preferably 20 
times greater than that of BFP (Tyr 67 -*His) . The invention also 
encompasses compositions such as vectors and cells that 
comprise either the mutant nucleic acids or the mutant protein 
gene products. The mutant GFP nucleic acids and proteins may 
be used to detect and quantify gene expression in living 
cells, and to detect and quantify tissue specific expression 
and subcellular distribution of GFP or of GFP fused to other 
proteins . 

I • General Definitions 

Unless defined otherwise, all technical and 
scientific terms used herein have the same meaning as commonly 
understood by one of ordinary skill in the art to which this 
invention belongs. Singleton et al. (1994) Dictionary of 
Microbiology and Molecular Biology, second edition, John Wiley 
and Sons (New York) provides one of skill with a general 
dictionary of many of the terms used in this invention. 
Although any methods and materials similar or equivalent to 
those described herein can be used in the practice or testing 
of the present invention, the preferred methods and materials 
are described. For purposes of the present invention, the 
following terms are defined below. 
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The symbols, abbreviations and definitions used 
herein are set forth below: 

DNA r deoxyribonucleic acid 
RNA, ribonucleic acid 
5 mRNA, messenger RNA 

cDNA, complementary DNA (enzymatically synthesized from an 

mRNA sequence) 
A-Adenine 
T- Thymine 
10 G-Guanine 
C-Cytosine 
U-Uracil 

GFP, Green Fluorescent Protein 
BFP, Blue Fluorescent Protein 

15 

Amino acids are sometimes referred to herein by the 
conventional one or three letter codes. 

Wild type green fluorescent protein ("wtGFP" ) refers 
to the 239 amino acid sequence described by Chalfie et al. ( 

20 Science 263, 802-805, 1994, the nucleotide sequence of which 
is set out as SEQ ID NO:l, and the amino acid sequence of 
which is set out as SEQ ID N0:2. This sequence differs from 
the original 238 amino acid GFP isolated from the 
bioluminescent jellyfish Aeguorea victoria in that one amino 

25 acid has been inserted after position 2 of the 238 amino acid 
sequence. When reference in this application is made to an 
amino acid position of GFP, the position is made with 
reference to that described by Chalfie et al . , supra and thus 
of SEQ ID NO: 2. 

30 The term "blue fluorescent protein" (BFP) refers to 

mutants of wtGFP wherein the tyrosine at position 67 is 
converted to a histidine, which mutants emit a blue 
fluorescence. The non- limiting prototype is herein designated 
BFP (Tyr 67 -*His) . 

35 A shorthand designation for mutations that result in 

a change in amino acid sequence is the one or three letter 
code for the original amino acid, the number of the position 
of the amino acid in the wtGFP sequence, followed by the one 
or three letter code for the new amino acid. Thus, Phe65Leu 

40 or F65L both designate a mutation wherein the phenylalanine at 
position 65 of the wtGFP is converted to leucine. 

Salts of any of the proteins described herein will 
naturally occur when such proteins are present in (or isolated 
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from) aqueous solutions of various pHs . All salts of peptides 
having the indicated biological activity are considered to be 
within the scope of the present invention. Examples include 
alkali, alkaline earth, and other metal salts of carboxylic 
acid residues, acid addition salts (e.g., HC1) of amino 
residues, and Zwitterions formed by reactions between 
carboxylic acid and amino acid residues within the same 
molecule . 

The terms "bioluminescent M and "fluorescent" refer 
to the ability of GFP or of a derivative thereof to emit light 
("emitted or fluorescent light") of a characteristic 
wavelength when excited by light which is generally of a 
characteristic and different wavelength than that used to 
generate the emission. 

The term "cellular fluorescence" denotes the 
fluorescence of a GFP-derived protein of the present invention 
when expressed in a cell, especially a mammalian cell. 

The term "nucleic acid" refers to a 
deoxyribonucleotide or ribonucleotide polymer in either 
single- or double -stranded form, and unless specifically 
limited, encompasses known analogues of natural nucleotides 
that hybridize to nucleic acids in a manner similar to 
naturally occurring nucleotides. Unless otherwise indicated, 
a particular nucleic acid sequence implicitly provides the 
complementary sequence thereof, as well as the sequence 
explicitly indicated. As used herein, the terms "nucleic 
acid" and "gene" are interchangeable, and they encompass the 
term "cDNA. " 

The phrase "a nucleic acid sequence encoding" refers 
to a nucleic acid which contains sequence information that, if 
translated, yields the primary amino acid sequence of a 
specific protein or peptide. This phrase specifically 
encompasses degenerate codons (i.e., different codons which 
encode a single amino acid) of the native sequence or 
sequences which may be introduced to conform with codon 
preference in a specific host cell. 

The phrase "nucleic acid construct" denotes a 
nucleic acid that is composed of two or more nucleic acid 
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sequences that are derived from different sources and that are 
ligated together using methods known in the art. 

The term "regulatory sequence" denotes all the non- 
coding elements of a nucleic acid sequence required for the 
5 correct and efficient expression of the "coding region" (i.e., 
the region that actually encodes the amino acid sequence of a 
peptide or protein), e.g., binding cites for polymerases and 
transcription factors, transcription and translation 
initiation and termination sequences, TATA box, a promoter to 
10 direct transcription, a ribosome binding site for 

translational initiation, polyadenylation sequences, enhancer 
elements . 

The term "isolated" refers to material which is 
substantially or essentially free from components which 

15 normally accompany it as found in its native state (for 

example, a band on a gel) . The isolated nucleic acids and the 
isolated proteins of this invention do not contain materials 
normally associated with their in situ environment, in 
particular, nuclear, cytosolic or membrane associated proteins 

20 or nucleic acids other than those nucleic acids which are 

indicated. The term "homogeneous" refers to a peptide or DNA 
sequence where the primary molecular structure (i.e., the 
sequence of amino acids or nucleotides) of substantially all 
molecules present in the composition under consideration is 

25 identical. The term "substantially" used in the preceding 
sentences preferably means at least 80% by weight, more 
preferably at least 95% by weight, and most preferably at 
least 99% by weight. 

The nucleic acids of this invention, whether RNA, 

30 cDNA, genomic DNA, or a hybrid of the various combinations, 

are synthesized in vitro or are isolated from natural sources 
or recombinant clones. The nucleic acids claimed herein are 
present in transformed or transfected whole cells, in 
transformed or transfected cell lysates, or in a partially 

35 purified or substantially pure form. The nucleic acids of the 
present invention are obtained as homogeneous preparations . 
They may be prepared by standard techniques well known in the 
art, including selective precipitation with such substances as 
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ammonium sulfate, isopropyl alcohol, ethyl alcohol, and/or 
exclusion, ion exchange or affinity column chromatography, 
immunopurif ication methods, and others. 

The phrase "conservatively modified variants 
thereof," when used with reference to a protein, denotes 
conservative amino acid substitutions in which both the 
original and the substituted amino acids have similar 
structure (e.g., the R group contains a carboxylic acid) and 
properties (e.g., the original and the substituted amino acids 
are acidic, such as glutamic and aspartic acid), such that the 
substitutions do not essentially alter specified properties of 
the protein, such as fluorescence. Amino acid substitutions 
that are conservative are well known in the art. The phrase 
"conservatively modified variants thereof," when used to 
describe a reference nucleic acid, denotes nucleic acids 
having nucleotide substitutions that yield degenerate codons 
for a given amino acid or that encode conservative amino acid 
substitutions, as compared to the reference nucleic acid. 

The term "recombinant" or "engineered" when used 
with reference to a nucleic acid or a protein generally 
denotes that the composition or primary sequence of said 
nucleic acid or protein has been altered from the naturally 
occurring sequence using experimental manipulations well known 
to those skilled in the art. it may also denote that a 
nucleic acid or protein has been isolated and cloned into a 
vector, or that the nucleic acid that has been introduced into 
or expressed in a cell or cellular environment other than the 
cell or cellular environment in which said nucleic acid or 
protein may be found in nature. The phrase "engineered 
Aequorea victoria fluorescent protein" specifically 
encompasses a protein obtained by introducing one or more 
sequence alterations into the coding region of a nucleic acid 
that encodes wild type Aequorea victoria GFP, wherein the gene 
product of the engineered nucleic acid is a fluorescent 
protein recognized by antisera to wild type Aequorea victoria 
GFP. 

The term "recombinant" or "engineered" when used 
with reference to a cell indicates that, as a result of 
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experimental manipulation, the cell replicates or expresses a 
nucleic acid or expresses a peptide or protein encoded by a 
nucleic acid, whose origin is exogenous to the cell. 
Recombinant cells can express nucleic acids that are not found 
5 within the native ( non- recombinant ) form of the cell. 

Recombinant cells can also express nucleic acids found in the 
native form of the cell wherein the nucleic acids are re- 
introduced into the cell by artificial means. 

The term "vector" denotes an engineered nucleic acid 

10 construct that contains sequence elements that mediate the 
replication of the vector sequence and/or the expression of 
coding sequences present on the vector. Examples of vectors 
include eukaryotic and prokaryotic plasmids, viruses (for 
example, the HIV virus), cosmids, phagemids, and the like. 

15 The term "operably linked" refers to functional linkage 
between a first nucleic acid (for example, an expression 
control sequence such as a promoter or an array of 
transcription factor binding sites) and a second nucleic acid 
sequence, wherein the expression control sequence directs 

20 transcription of the nucleic acid corresponding to the second 
sequence. One or more selected isolated nucleic acids may be 
operably linked to a vector by methods known in the art. 

"Transduction" or "transformation" denotes the 
process whereby exogenous extracellular DNA is introduced into 

25 a cell, such that the cell is capable of replicating and or 
expressing the exogenous DNA. Generally, a selected nucleic 
acid is first inserted into a vector and the vector is then 
introduced into the cell. For example, plasmid DNA that is 
introduced under appropriate environmental conditions may 

30 undergo replication in the transformed cell, and the 

replicated copies are distributed to progeny cells when cell 
division occurs. As a result, a new cell line is established, 
containing the plasmid and carrying the genetic determinants 
thereof. Transformation by a plasmid in this manner, where 

3 5 the plasmid genes are maintained in the cell line by plasmid 
replication, occurs at high frequency when the transforming 
plasmid DNA is in closed loop form, and does not or rarely 
occurs if linear plasmid DNA is used. 
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All the patents and publications cited in this 
disclosure are indicative of the level of skill of those 
skilled in the art to which this invention pertains and are 
all herein individually incorporated by reference for all 
purposes . 

II- The GFP Mutants and Their Expression 

A. The GFP mutants 

The isolated nucleic acids reported here are those 
that encode an engineered protein derived from Aequorea 
victoria green fluorescent protein ( 11 GFP" ) having a 
fluorescence at maximum emission that is at least five times 
greater, preferably ten times greater, and most preferably 
twenty times greater than the fluorescence at maximum emission 
of wild type GFP. In one embodiment, a nucleic acid encodes 
for leucine at amino acid position 65. This amino acid 
position is important for the enhanced fluorescence. In 
another embodiment the engineered isolated GFP nucleic acid 
also encodes for threonine at amino acid position 168. In an 
additional embodiment, the engineered isolated GFP nucleic 
acid further encodes for cysteine at amino acid position 66. 

Also described here are GFP mutants that have 
enhanced blue fluorescent properties. These mutants have an 
isolated nucleic acid that encode an engineered Aequorea 
victoria blue fluorescent protein that encodes for histidine 
at amino acid position 67, leucine at amino acid position 65 
and has a cellular fluorescence that is at least five times 
greater, preferably 10 times greater, most preferably 20 times 
greater than that of BFP (Tyr 67 -»His) . An alternative isolated 
BFP nucleic acid is one that encodes for an engineered 
Aequorea victoria blue fluorescent protein wherein the 
engineered BFP has histidine at amino acid position 67 and 
alanine at amino acid position 164. A third engineered 
isolated BFP nucleic acid sequence is one that has histidine 
at amino acid position 67, leucine at amino acid position 65 
and alanine at amino acid position 164. 
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The nucleic acid and amino acid sequences for the 
wild type GFP are set out in SEQ ID NO:l and SEQ ID NO: 2. The 
sequence is well-known, well -described and readily available 
for manipulation and use. Vectors bearing the nucleic acid 
5 sequence are commercially readily available from, for example, 
Clontech Laboratories, Inc., Clontech Laboratories, Inc., Palo 
Alto, CA. Clontech provides a line of reporter vectors for 
GFP, including the cDNA construct described by Chalfie, et 
al., supra, a promoterless GFP vector for monitoring the 

10 expression of cloned promoters in mammalian cells, and a 

series of vectors for creating fusion proteins to either the 
amino or carboxy terminus of GFP. 

One of skill in the art will recognize many ways of 
generating alterations in a given nucleic acid sequence. Such 

15 well-known methods include site -directed mutagenesis, PCR 

amplification using degenerate oligonucleotides, exposure of 
cells containing the nucleic acid to mutagenic agents or 
radiation, chemical synthesis of a desired oligonucleotide 
(e.g., in conjunction with ligation and/or cloning to generate 

20 large nucleic acids) and other well-known techniques. See, 
e.g., Berger and Kimmel, Guide to Molecular Cloning 
Techniques, Methods in Enzymology Volume 152 Academic Press, 
Inc., San Diego, CA (Berger); Sambrook et al . (1989) Molecular 
Cloning - A Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring 

25 Harbor Laboratory, Cold Spring Harbor Press, NY, (Sambrook) ; 
and Current Protocols in Molecular Biology, F.M. Ausubel et 
al., eds . , Current Protocols, a joint venture between Greene 
Publishing Associates, Inc. and John Wiley & Sons, Inc., (1994 
Supplement) (Ausubel); Pirrung et al., U.S. Patent No. 

30 5,143,854; and Fodor et al., Science, 251, 767-77 (1991). 

Product information from manufacturers of biological reagents 
and experimental equipment also provide information useful in 
known biological methods. Such manufacturers include the 
SIGMA Chemical Company (Saint Louis, MO), R&D systems 

35 (Minneapolis, MN) , Pharmacia LKB Biotechnology (Piscataway, 
NJ) , CLONTECH Laboratories, Inc. (Palo Alto, CA) , Chem Genes 
Corp., Aldrich Chemical Company (Milwaukee, WI) , Glen 
Research, Inc., GIBCO BRL Life Technologies, Inc. 
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(Gaithersberg, MD) , Fluka Chemica-Biochemika Analytika (Fluka 
Chemie AG, Buchs. Switzerland), and Applied Biosystems (Foster 
City, CA) , as well as many other commercial sources known to 
one of skill. Using these techniques, it is possible to 
substitute at will any nucleotide in a nucleic acid that 
encodes any GFP or BFP disclosed herein or any amino acid in a 
GFP or BFP described herein for a predetermined nucleotide or 
amino acid. For example, ic is possible to generate at will 
modified GFPs and BFP (Tyr 67 -»His ) s that contain leucine at 
position 65 and one or two or three additional mutations at 
any other position of the wtGFP or BFP (Tyr 67 -*His) . 

The sequence of the cloned genes and synthetic 
oligonucleotides can be verified using the chemical 
degradation method of A.M. Maxam et al . (1980), Methods in 
Enzymology 65:499-560. The sequence can be confirmed after 
the assembly of the oligonucleotide fragments into the 
double -stranded DNA sequence using the method of Maxam and 
Gilbert, supra, or the chain termination method for sequencing 
double-stranded templates of R.B. Wallace et al. (1981), Gene, 
16:21-26. DNA sequencing may also be performed by the 
PCR-assisted fluorescent terminator method (ReadyReaction 
DyeDeoxy Terminator Cycle Sequencing Kit, ABI, Columbia, MD) 
according to the manufacturer's instructions, using the ABI 
Model 3 73 A DNA Sequencing System. Sequencing data is analyzed 
using the commercially available Sequencher program (Gene 
Codes, Gene Codes, Ann Arbor, MI) . 

B - Expression of Mutant GPP 

Clearly, the nucleic acid sequences of the present 
invention are excellent reporter sequences since the expressed 
proteins can be readily detected by fluorescence as described 
below. The sequences can be used in conjunction with any 
application appreciated to date for GFP and further in 
applications where a greater degree of fluorescence is 
required. Expression of the sequences described herein 
whether expression is desired alone or in combination with 
other sequences of interest is described below. 
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Vectors to which selected foreign nucleic acids are 
operably linked may be used to introduce these selected 
nucleic acids into host cells and mediate their replication 
and/or expression. Cloning vectors are useful for replicating 
5 the foreign nucleic acids and obtaining clones of specific 

foreign nucleic acid-containing vectors. Expression vectors 
mediate the expression of the foreign nucleic acid. Some 
vectors are both cloning and expression vectors. 

Once a nucleic acid is synthesized or isolated and 

10 inserted into a vector and cloned, one may express the nucleic 
acid in a variety of recombinantly engineered cells known to 
those of skill in the art. As used herein, "expression" 
refers to transcription of nucleic acids, either without or 
preferably with subsequent translation. 

15 Expression of a mutant BFP or of wild type or mutant 

GFP can be enhanced by including multiple copies of the GFP- 
encoding nucleic acid in a transformed host, by selecting a 
vector known to reproduce in the host, thereby producing large 
quantities of protein from exogenous inserted DNA (such as 

20 pUC8, ptacl2, or pIN- III -ompAl , 2, or 3), or by any other 

known means of enhancing peptide expression. In all cases, 
wtGFP or mutant GFPs will be expressed when the DNA sequence 
is functionally inserted into a vector. "Functionally 
inserted" means that it is inserted in proper reading frame 

25 and orientation. Typically, a GFP gene will be inserted 
downstream from a promoter and will be followed by a stop 
codon, although production as a hybrid protein followed by 
cleavage may be used, if desired. 

Examples of cells which are suitable for the cloning 

30 and expression of the nucleic acids of the invention include 
bacteria, yeast, filamentous fungi, insect (especially 
employing baculoviral vectors) , and mammalian cells, in 
particular cells capable of being maintained in tissue 
culture . 

35 Host cells are competent or rendered competent for 

transformation by various means. There are several well-known 
methods of introducing DNA into animal cells. These include: 
calcium phosphate precipitation, fusion of the recipient cells 
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with bacterial protoplasts containing the DNA, treatment of 
the recipient cells with liposomes containing the DNA, DEAE 
dextran, receptor-mediated endocytosis, electroporation and 
micro- injection of the DNA directly into the cells. 

It is expected that those of skill in the art are 
knowledgeable in the numerous systems available for cloning 
and expression of nucleic acids. In brief summary, the 
expression of natural or synthetic nucleic acids is typically 
achieved by operably linking a nucleic acid of interest to a 
promoter (which is either constitutive or inducible) , and 
incorporating the construct into an expression vector. The 
vectors are suitable for replication and integration in 
prokaryotes, eukaryotes, or both. Typical cloning vectors 
contain transcription and translation terminators, 
transcription and translation initiation sequences, and 
promoters useful for regulation of the expression of the 
particular nucleic acid. The vectors optionally comprise 
generic expression cassettes containing at least one 
independent terminator sequence, sequences permitting 
replication of the cassette in eukaryotes, or prokaryotes, or 
both, (e.g., shuttle vectors) and selection markers for both 
prokaryotic and eukaryotic systems. See, e.g., Sambrook and 
Ausbel (both supra) . 

1. Expression in Prokaryotes 

Prokaryotic systems for cloning and/or expressing 
engineered GFP or BFP proteins are available using E. coli, 
Bacillus sp. and Salmonella (Palva, I. et al. (1983), Gene 
22:229-235; Mosbach, K. etal. (1983), Nature 302 : 543-545 . To 
obtain high level expression in a prokaryotic system of a 
cloned nucleic acid such as those encoding engineered GFPs or 
BFPs, it is essential to construct expression vectors which 
contain, at a minimum, a strong promoter to direct 
transcription, a ribosome binding site for translational 
initiation, a transcription/translation terminator, a 
bacterial replicon, a nucleic acid encoding antibiotic 
resistance to permit selection of bacteria that harbor 
recombinant plasmids, and unique restriction sites in 
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nonessential regions of the plasmid to allow insertion of 
foreign nucleic acids. The particular antibiotic resistance 
gene chosen is not critical, any of the many resistance genes 
known in the art are suitable. Examples of regulatory regions 
5 suitable for this purpose in E. coli are the promoter and 

operator region of the E. coli tryptophan biosynthetic pathway 
as described by Yanofsky, C. (1984), J. Bacterid., 
158:1018-1024, and the leftward promoter of phage lambda (P L ) 
as described by Herskowitz, I. and Hagen, D. (1980), Ann. Rev. 

10 Genet., 14:399-445 (1980). 

The particular vector used to transport the genetic 
information into the cell is not particularly critical. Any 
of the conventional vectors used for replication, cloning 
and/cr expression in prokaryotic cells may be used. 

15 The foreign nucleic acid can be incorporated into a 

nonessential region of the host cell's chromosome. This is 
achieved by first inserting the nucleic acid into a vector 
such that it is flanked by regions of DNA homologous to the 
insertion site in the host chromosome. After introduction of 

20 the vector into a host cell, the foreign nucleic acid is 

incorporated into the chromosome by homologous recombination 
between the flanking sequences and chromosomal DNA. 

Detection of the expressed protein is achieved by 
methods known in the art as radioimmunoassays, or Western 

25 blotting techniques or immunoprecipitat ion . Purification from 
£. coli can be achieved following procedures described in U.S. 
Patent No. 4,511,503. 



2. Expression in Eukaryotes 

30 Standard eukaryotic transfection methods are used to 

produce mammalian, yeast or insect cell lines which express 
large quantities of engineered GFP or BFP protein which are 
then purified using standard techniques. See, e.g., Colley et 
al. (1989), J. Biol. Chem. 264:17619-17622, and Guide to 

35 Protein Purification, in Vol. 182 of Methods in Enzymology 
(Deutscher ed., 1990), D.A. Morrison (1977), J. BacC, 
132:349-351, or by J.E. Clark-Curt iss and R. Curtiss (1983), 
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Methods In Enzymology 101:347-362, Eds. R. Wu et al . , 
Academic Press, New York. 

The particular eukaryotic expression vector used to 
transport the genetic information into the cell is not 
particularly critical. Any of the conventional vectors used 
for expression in eukaryotic cells may be used.. Expression 
vectors containing regulatory elements from eukaryotic viruses 
such as retroviruses are typically used. SV4 0 vectors include 
pSVT7 and pMT2 . Vectors derived from bovine papilloma virus 
include pBV-lMTHA, and vectors derived from Epstein Barr virus 
include pHEBO , and p205. Other exemplary vectors include 
pMSG, pAV009/A + , pMTO10/A+, pMAMneo-5, baculovirus pDSVE, and 
any other vector allowing expression of proteins under the 
direction of the SV-40 early promoter, SV-4 0 later promoter, 
metallothionein promoter, murine mammary tumor virus promoter, 
Rous sarcoma virus promoter, polyhedrin promoter, or other 
promoters shown effective for expression in eukaryotic cells. 

The expression vector typically comprises a 
eukaryotic transcription unit or expression cassette that 
contains all the elements required for the expression of the 
engineered GFP or BFP DNA in eukaryotic cells. A typical 
expression cassette contains a promoter operably linked to the 
DNA sequence encoding a engineered GFP or BFP protein and 
signals required for efficient polyadenylation of the 
transcript . 

Eukaryotic promoters typically contain two types of 
recognition sequences, the TATA box and upstream promoter 
elements. The TATA box, located 25-30 base pairs upstream of 
the transcription initiation site, is thought to be involved 
in directing RNA polymerase to begin RNA synthesis. The other 
upstream promoter elements determine the rate at which 
transcription is initiated. 

Enhancer elements can stimulate transcription up to 
1,000 fold from linked homologous or heterologous promoters. 
Enhancers are active when placed downstream or upstream from 
the transcription initiation site. Many enhancer elements 
derived from viruses have a broad host range and are active in 
a variety of tissues. For example, the SV40 early gene 
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enhancer is suitable for many cell types. Other 
enhancer/promoter combinations that are suitable for the 
present invention include those derived from polyoma virus, 
human or murine cytomegalovirus, the long term repeat from 
5 various retroviruses such as murine leukemia virus, murine or 
Rous sarcoma virus and HIV. See, Enhancers and Eukaryotic 
Expression, Cold Spring Harbor Press, Cold Spring Harbor, N.Y. 
1983, which is incorporated herein by reference. 

In the construction of the expression cassette, the 

10 promoter is preferably positioned about the same distance from 
the heterologous transcription start site as it is from the 
transcription start site in its natural setting. As is known 
in the art, however, some variation in this distance can be 
accommodated without loss of promoter function. 

15 In addition to a promoter sequence, the expression 

cassette should also contain a transcription termination 
region downstream of the structural gene to provide for 
efficient termination. The termination region may be obtained 
from the same gene as the promoter sequence or may be obtained 

20 from different genes. 

If the mRNA encoded by the structural gene is to be 
efficiently translated, polyadenylation sequences are also 
commonly added to the vector construct. Two distinct sequence 
elements are required for accurate and efficient 

25 polyadenylation: GU or U rich sequences located downstream 

from the polyadenylation site and a highly conserved sequence 
of six nucleotides, AAUAAA, located 11-30 nucleotides 
upstream. Termination and polyadenylation signals that are 
suitable for the present invention include those derived from 

30 SV40, or a partial genomic copy of a gene already resident on 
the expression vector. 

In addition to the elements already described, the 
expression vector of the present invention may typically 
contain other specialized elements intended to increase the 

35 level of expression of cloned nucleic acids or to facilitate 
the identification of cells that carry the transfected DNA. 
For instance, a number of animal viruses contain DNA sequences 
that promote the extra chromosomal replication of the viral 
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genome in permissive cell types. Plasmids bearing these viral 
replicons are replicated episomally as long as the appropriate 
factors are provided by genes either carried on the plasmid or 
with the genome of the host cell. 

The DNA sequence encoding the engineered GFP or BFP 
protein may typically be linked to a cleavable signal peptide 
sequence to promote secretion of the encoded protein by the 
transformed cell. Such signal peptides would include, among 
others, the signal peptides from tissue plasminogen activator, 
insulin, neuron growth factor, and juvenile hormone esterase 
of Heliothis virescens. Additional elements of the cassette 
may include enhancers and, if genomic DNA is used as the 
structural gene, introns with functional splice donor and 
acceptor sites. 

The vector may or may not comprise a eukaryotic 
replicon. If a eukaryotic replicon is present, then the 
vector is amplifiable in eukaryotic cells using the 
appropriate selectable marker. If the vector does not 
comprise a eukaryotic replicon, no episomal amplification is 
possible. Instead, the transfected DNA integrates into the 
genome of the transfected cell, where the promoter directs 
expression of the desired nucleic acid. 

The vectors usually comprise selectable markers 
which result in nucleic acid amplification such as the sodium, 
potassium ATPase, thymidine kinase, aminoglycoside 
phosphotransferase, hygromycin B phosphotransferase, 
xanthine -guanine phosphoribosyl transferase, CAD (carbamyl 
phosphate synthetase, aspartate transcarbamylase , and 
dihydroorotase) , adenosine deaminase, dihydrofolate reductase, 
and asparagine synthetase and ouabain selection. 
Alternatively, high yield expression systems not involving 
nucleic acid amplification are also suitable, such as using a 
bacculovirus vector in insect cells, with the engineered GFP 
or BFP encoding sequence under the direction of the polyhedrin 
promoter or other strong baculovirus promoters. 

The expression vectors of the present invention will 
typically contain both prokaryotic sequences that facilitate 
the cloning of the vector in bacteria as well as one or more 
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eukaryotic transcription units that are expressed only in 
eukaryotic cells, such as mammalian cells. The prokaryotic 
sequences are preferably chosen such that they do not 
interfere with the replication of the DNA in eukaryotic cells. 
5 Any of the well known procedures for introducing 

foreign nucleotide sequences into host cells may be used. 
These include the use of calcium phosphate transf ection, 
polybrene, protoplast fusion, electroporation, liposomes, 
microinjection, plasma vectors, viral vectors and any of the 

10 other well known methods for introducing cloned genomic DNA, 
cDNA, synthetic DNA or other foreign nucleic acidic material 
into a host cell (see Sambrook et al., supra). It is only 
necessary that the particular genetic engineering procedure 
utilized be capable of successfully introducing at least one 

15 nucleic acid into the host cell which is capable of expressing 
the engineered GFP or BFP protein. 

3. Expression in insect cells 

The baculovirus expression vector utilizes the 

20 highly expressed and regulated Autographa calif ornica nuclear 
polyhedrosis virus (AcMNPV) polyhedrin promoter modified for 
the insertion of foreign nucleic acids. Synthesis of 
polyhedrin protein results in the formation of occlusion 
bodies in the infected insect cell. The baculovirus vector 

25 utilizes many of the protein modification, processing, and 

transport systems that occur in higher eukaryotic cells. The 
recombinant eukaryotic proteins expressed using this vector 
have been found in many cases to be, antigenically, 
immunogenically , and functionally similar to their natural 

30 counterparts. 

Briefly, a DNA sequence encoding an engineered GFP 
or BFP is inserted into a transfer plasmid vector in the 
proper orientation downstream from the polyhedrin promoter, 
and flanked on both ends with baculovirus sequences. Cultured 

35 insect cells, commonly Spodoptera frugiperda cells, are 

transf ected with a mixture of viral and plasmid DNAs. The 
virus that develop, some of which are recombinant virus that 
result from homologous recombination between the two DNAs, are 
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placed at 100-1000 plaques per plate. The plaques containing 
recombinant virus can be identified visually because of their 
ability to form occlusion bodies or by DNA hybridization. The 
recombinant virus is isolated by plague purification. The 
resulting recombinant virus, capable of expressing engineered 
GFP or BFP, is self -propagating in that no helper virus is 
required for maintenance or replication. After infecting an 
insect culture with recombinant virus, one can expect to find 
recombinant protein within 48-72 hours. The infection is 
essentially lytic within 4-5 days. 

There are a variety of transfer vectors into which 
the engineered GFP or BFP nucleic acid can be inserted. For a 
summary of transfer vectors see Luckow, V.A. and Summers, M.D. 
(1988), Bio/Technology 6:47-55. Preferred is the transfer 
vector pAcUW21 described by Bishop, D.H.L. (1992) in Seminars 
in Virology 3:253-264. 

4. Retroviral Vectors 

Retroviral vectors are particularly useful for 
modifying eukaryotic cells because of the high efficiency with 
which the retroviral vectors transduce target cells and 
integrate into the target cell genome. Additionally, the 
retroviruses harboring the retoviral vector are capable of 
infecting cells from a wide variety of tissues. 

Retroviral vectors are produced by genetically 
manipulating retroviruses. Retroviruses are RNA viruses 
because the viral genome is RNA. Upon infection, this genomic 
RNA is reverse transcribed into a DNA copy which is integrated 
into the chromosomal DNA of transduced cells with a high 
degree of stability and efficiency. The integrated DNA copy 
is referred to as a provirus and is inherited by daughter 
cells as is any other gene. The wild type retroviral genome 
and the proviral DNA have three genes: the gag, the pol and 
the env genes, which are flanked by two long terminal repeat 
(LTR) sequences. The gag gene encodes the internal structural 
(nucleocapsid) proteins; the pol gene encodes the RNA directed 
DNA polymerase (reverse transcriptase); and the env gene 
encodes viral envelope glycoproteins. The 5' and 3* LTRs 
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serve to promote transcription and polyadenylation of virion 
RNAs. Adjacent to the 5' LTR are sequences necessary for 
reverse transcription of the genome (the tRNA primer binding 
site) and for efficient encapsulation of viral RNA into 
5 particles (the Psi site). See Mulligan, R.C. (1983), In: 

Experimental Manipulation of Gene Expression, M. Inouye (ed) , 
155-173; Mann, R. etai. (1983), Cell, 33:153-159; Cone, R.D. 
and R.C. Mulligan (1984), Proceedings of the National Academy 
of Sciences, U.S.A. 81:6349-6353. 

10 The design of retroviral vectors is well known to 

one of skill in the art. See Singer, M. and Berg, P. supra. 
In brief, if the sequences necessary for encapsidation (or 
packaging of retroviral RNA into infectious virions) are 
missing from the viral genome, the result is a cis acting 

15 defect which prevents encapsidation of genomic RNA. However, 
the resulting mutant is still capable of directing the 
synthesis of all virion proteins. Retroviral genomes from 
which these sequences have been deleted, as well as cell lines 
containing the mutant genome stably integrated into the 

20 chromosome are well known in the art and are used to construct 
retroviral vectors. Preparation of retroviral vectors and 
their uses are described in many publications including 
European Patent Application EPA 0 178 22 0, U.S. Patent 
4,405,712, Gilboa (1986), Bio techniques 4 : 504 - 512 , Mann, et 

25 al. (1983), Cell 33:153-159, Cone and Mulligan (1984), Proc. 

Natl. Acad. Sci. USA 81:6349-6353, Eglitis, M.A, et al . (1988) 
Biotechniques 6:608-614, Miller, A. D. et al . (1989) 
Biotechniques 7:981-990, Miller, A.D. (1992) Nature, supra, 
Mulligan, R.C. (1993), supra, and Gould, B. et al . , and 

30 International Patent Application No. WO 92/07943 entitled 

"Retroviral Vectors Useful in Gene Therapy." The teachings of 
these patents and publications are incorporated herein by 
reference . 

The retroviral vector particles are prepared by 
3 5 recombinant ly inserting the nucleic acid encoding engineered 
GFP or BFP into a retrovirus vector and packaging the vector 
with retroviral capsid proteins by use of a packaging cell 
line. The resultant retroviral vector particle is incapable 
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of replication in the host cell and is capable of integrating 
into the host cell genome as a proviral sequence containing 
the engineered GFP or BFP nucleic acid. As a result, the 
patient is capable of producing engineered GFP or BFP and 
metabolize glycogen to completion. 

Packaging cell lines are used to prepare the 
retroviral vector particles. A packaging cell line is a 
genetically constructed mammalian tissue culture cell line 
that produces the necessary viral structural proteins required 
for packaging, but which is incapable of producing infectious 
virions. Retroviral vectors, on the other hand, lack the 
structural genes but have the nucleic acid sequences necessary 
for packaging. To prepare a packaging cell line, an 
infectious clone of a desired retrovirus, in which the 
packaging site has been deleted, is constructed. Cells 
comprising this construct will express all structural proteins 
but the introduced DNA will be incapable of being packaged. 
Alternatively, packaging cell lines can be produced by 
transforming a cell line with one or more expression plasmids 
encoding the appropriate core and envelope proteins. In these 
cells, the gag, pol, and env genes can be derived from the 
same or different retroviruses. 

A number of packaging cell lines suitable for the 
present invention are available in the prior art. Examples of 
these cell lines include Crip, GPE86 , PA317 and PG13 . See 
Miller et al. (1991), J . Virol. 65:2220-2224, which is 
incorporated herein by reference. Examples of other packaging 
cell lines are described in Cone, R. and Mulligan, R.C. 
(1984), Proceedings of the National Academy of Sciences, 
U.S.A., 81:6349-6353 and in Danos , O. and R.C. Mulligan 
(1988), Proceedings of the National Academy of Sciences, 
U.S.A., 85:6460-6464, Eglitis, M.A, et al . (1988) 
Biotechniques 6:608-614, also all incorporated herein by 
reference . 

Packaging cell lines capable of producing retroviral 
vector particles with chimeric envelope proteins may be used. 
Alternatively, amphotropic or xenotropic envelope proteins, 
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such as those produced by PA317 and GPX packaging cell lines 
may be used to package the retroviral vectors. 

Transforming cells with nucleic acids can involve, 
for example, incubating the cells with viral vectors (e.g., 
5 retroviral or adeno-associated viral vectors) containing with 
cells within the host range of the vector. See, e.g., Methods 
in Enzymology, Vol. 185, Academic Press, Inc., San Diego, CA 
(D.V. Goeddel, ed.) (1990) or M. Krieger (1990), Gene Transfer 
and Expression - - A Laboratory Manual, Stockton Press, New 
10 York, NY, and the references cited therein. 

5. Transformation with adeno-associated virus 

Adeno associated viruses (AAVs) require helper 
viruses such as adenovirus or herpes virus to achieve 

15 productive infection. In the absence of helper virus 

functions, AAV integrates (site-specifically) into a host 
cell's genome, but the integrated AAV genome has no pathogenic 
effect. The integration step allows the AAV genome to remain 
genetically intact until the host is exposed to the 

20 appropriate environmental conditions (e.g., a lytic helper 

virus), whereupon it re-enters the lytic life-cycle. Samulski 
(1993), Current Opinion in Genetic and Development 3:74-80 and 
the references cited therein provides an overview of the AAV 
life cycle. 

25 AAV-based vectors are used to transduce cells with 

target nucleic acids, e.g., in the in vitro production of 
nucleic acids and peptides, and in in vivo and ex vivo gene 
therapy procedures. See, West et al . (1987), Virology 160:38- 
47; Carter et al. (1989) U.S. Patent No. 4,797,368; Carter et 

30 al. (1993), WO 93/24641; Kotin (1994), Human Gene Therapy 
5:793-801; Muzyczka (1994), J. Clin. Invest. 94:1351 and 
Samulski {supra) for an overview of AAV vectors. 

Recombinant AAV vectors (rAAV vectors) deliver 
foreign nucleic acids to a wide range of mammalian cells 

35 (Hermonat & Muzycka (1984), Proc. Natl. Acad. Sci. USA 
81:6466-6470; Tratschin et al . (1985), Mol . Ceil Biol. 
5:3251-3260), integrate into the host chromosome (Mclaughlin 
et al. (1988), J. Virol. 62:1963-1973), and show stable 
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expression of the transgene in cell and animal models (Flotte 
et al. (1993), Proc. Natl. Acad. Sci . USA 90:10613-10617). 
Moreover, unlike some retroviral vectors, rAAV vectors are 
able to infect non-dividing cells (Podsakoff et al . (1994), J. 
Virol. 68:5656-66; Flotte et al. (1994), Am. J. Respir. Cell 
Mol. Biol. 11:517-521). Further advantages of rAAV vectors 
include the lack of an intrinsic strong promoter, thus 
avoiding possible activation of downstream cellular sequences, 
and their naked eicosahedral capsid structure, which renders 
them stable and easy to concentrate by common laboratory 
techniques. rAAV vectors are used to inhibit, e.g., viral 
infection, by including anti-viral transcription cassettes in 
the rAAV vector which comprise an inhibitor of the invention. 

6. Expression in recombinant vaccinia virus- 
infected cells 

The nucleic acid encoding engineered GFP or BFP is 

inserted into a plasmid designed for producing recombinant 

vaccinia, such as pGS62, Langford, C.L. etal. (1986), Mol. 

Cell. Biol. 6:3191-3199. This plasmid consists of a cloning 

site for insertion of foreign nucleic acids, the P7.5 promoter 

of vaccinia to direct synthesis of the inserted nucleic acid, 

and the vaccinia TK gene flanking both ends of the foreign 

nucleic acid. 

When the plasmid containing the engineered GFP or 
BFP nucleic acid is constructed, the nucleic acid can be 
transferred to vaccinia virus by homologous recombination in 
the infected cell. To achieve this, suitable recipient cells 
are transfected with the recombinant plasmid by standard 
calcium phosphate precipitation techniques into cells already 
infected with the desirable strain of vaccinia virus, such as 
Wyeth, Lister, WR or Copenhagen. Homologous recombination 
occurs between the TK gene in the virus and the flanking TK 
gene sequences in the plasmid. This results in a recombinant 
virus with the foreign nucleic acid inserted into the viral TK 
gene, thus rendering the TK gene inactive. Cells containing 
recombinant viruses are selected by adding medium containing 
5-bromodeoxyuridine, which is lethal for cells expressing a TK 
gene . 
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Confirmation of production of recombinant virus is 
achieved by DNA hybridization using cDNA encoding the 
engineered GFP or BFP and by immunodetection techniques using 
antibodies specific for the expressed protein. Virus stocks 
5 may be prepared by infection of cells such as HeLA S3 spinner 
cells and harvesting of virus progeny. 

7. Expression in cell cultures 

GFP- or BFP-encoding nucleic acids can be ligated to 

10 various expression vectors for use in transforming host cell 
cultures. The culture of cells used in conjunction with the 
present invention is well known in the art. Freshney (1994) 
(Culture of Animal Cells, a Manual of Basic Technique, third 
edition Wiley-Liss, New York), Kuchler et al . (1977) 

15 Biochemical Methods in Cell Culture and Virology, Kuchler, 
R.J., Dowden, Hutchinson and Ross, Inc., and the references 
cited therein provides a general guide to the culture of 
cells. Illustrative cell cultures useful for the production 
of recombinant proteins include cells of insect or mammalian 

20 origin. Mammalian cell systems often will be in the form of 
monolayers of cells, although mammalian cell suspensions are 
also used. Illustrative examples of mammalian cell lines 
include monocytes, lymphocytes, macrophage, VERO and HeLa 
cells, Chinese hamster ovary (CHO) cell lines, W138, BHK, 

25 Cos-7 or MDCK cell lines (see, e.g., Freshney, supra). 

Cells of mammalian origin are illustrative of cell 
cultures useful for the production of the engineered GFP or 
BFP . Mammalian cell systems often will be in the form of 
monolayers of cells although mammalian cell suspensions may 

30 also be used. Illustrative examples of mammalian cell lines 

include VERO and HeLa cells, Chinese hamster ovary (CHO) cell 
lines, WI38, BHK, COS-7 or MDCK cell lines. 

As indicated above, the vector, e.g., a plasmid, 
which is used to transform the host cell, preferably contains 

35 DNA sequences to initiate transcription and sequences to 

control the translation of the engineered GFP or BFP nucleic 
acid sequence. These sequences are referred to as expression 
control sequences. Illustrative expression control sequences 
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are obtained from the SV-40 promoter (Science 222:524-527, 
(1983)), the CMV i.e. Promoter (Proc. Natl. Acad. Sci. 
81:659-663, (1984)) or the metallothionein promoter (Nature 
296:39-42, (1982)). The cloning vector containing the 
expression control sequences is cleaved using restriction 
enzymes and adjusted in size as necessary or desirable and 
ligated with sequences encoding the engineered GFP or BFP 
protein by means well known in the art. 

The vectors for transforming cells in culture 
typically contain gene sequences to initiate transcription and 
translation of the engineered GFP or BFP gene. These 
sequences need to be compatible with the selected host cell. 
In addition, the vectors preferably contain a marker to 
provide a phenotypic trait for selection of transformed host 
cells such as dihydrof olate reductase or metallothionein. 
Additionally, a vector might contain a replicative origin. 

As mentioned above, when higher animal host cells 
are employed, polyadenlyation or transcription terminator 
sequences from known mammalian genes need to be incorporated 
into the vector. An example of a terminator sequence is the 
polyadenylation sequence from the bovine growth hormone gene. 
Sequences for accurate splicing of the transcript may also be 
included. An example of a splicing sequence is the VP1 intron 
from SV40 (Sprague, J. etal. (1983), J. Virol. 45: 773-781). 

Additionally gene sequences to control replication 
in the host cell may be incorporated into the vector such as 
those found in bovine papilloma virus type- vectors . 
Saveria-Campo, M . (1985), "Bovine Papilloma virus DNA a 
Eukaryotic Cloning Vector" in DNA Cloning Vol.11 a Practical 
Approach Ed. D.M. Glover, IRL Press, Arlington, Virginia pp. 
213-238 . 

The transformed cells are cultured by means well 
known in the art. For example, as published in Kuchler, R.J. 
et al., (1977), Biochemical Methods in Cell Culture and 
Virology. 

In addition to the above general procedures which 
can be used for preparing recombinant DNA molecules and 
transformed unicellular organisms in accordance with the 
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practices of this invention, other known techniques and 
modifications thereof can be used in carrying out the practice 
of the invention. Any known system for expression of isolated 
genes is suitable for use in the present invention. For 
5 example, viral expression systems such as the bacculovirus 
expression system are specifically contemplated within the 
scope of the invention. Many recent U.S. patents disclose 
plasmids, genetically engineering microorganisms, and methods 
of conducting genetic engineering which can be used in the 

10 practice of the present invention. For example, U.S. Pat. No. 
4,273,875 discloses a plasmid and a process of isolating the 
same. U.S. Pat. No. 4,3 04,863 discloses a process for 
producing bacteria by genetic engineering in which a hybrid 
plasmid is constructed and used to transform a bacterial host . 

15 U.S. Pat. No. 4,419,450 discloses a plasmid useful as a 
cloning vehicle in recombinant DNA work. U.S. Pat. No. 
4,362,867 discloses recombinant cDNA construction methods and 
hybrid nucleotides produced thereby which are useful in 
cloning processes. U.S. Pat. No. 4,4 03,03 6 discloses genetic 

20 reagents for generating plasmids containing multiple copies of 
DNA segments. U.S. Pat. No. 4,363,877 discloses recombinant 
DNA transfer vectors. U.S. Pat. No. 4,356,270 discloses a 
recombinant DNA cloning vehicle and is a particularly useful 
disclosure for those with limited experience in the area of 

25 genetic engineering since it defines many of the terms used in 
genetic engineering and the basic processes used therein. 
U.S. Pat. No. 4,336,336 discloses a fused gene and a method of 
making the same. U.S. Pat. No. 4,319,629 discloses plasmid 
vectors and the production and use thereof. U.S. Pat. No. 

30 4,332,901 discloses a cloning vector useful in recombinant 
DNA. Although some of these patents are directed to the 
production of a particular gene product that is not within the 
scope of the present invention, the procedures described 
therein can easily be modified to the practice of the 

35 invention described in this specification by those skilled in 
the art of genetic engineering. Transferring the isolated GFP 
cDNA to other expression vectors will produce constructs which 
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improve the expression of the GFP polypeptide in E. coli or 
express GFP in other hosts. 

111 • Detection of GFP and BFP H ucleir Aojds and Proteins 

A. General detection methods 

The nucleic acids and proteins of the invention are 
detected, confirmed and quantified by any of a number of means 
well known to those of skill in the art. The unique quality 
of the inventive expressed proteins here is that they provide 
an enhanced fluorescence which can be readily and easily 
observed. Fluorescence assays for the expressed proteins are 
described in detail below, other general methods for 
detecting both nucleic acids and corresponding proteins 
include analytic biochemical methods such as 
spectrophotometry, radiography, electrophoresis, capillary 
electrophoresis. high performance liquid chromatography 
<HPLC) , thin layer chromatography (TLC) , hyperdif fusion 
chromatography, and the like, and various immunological 
methods such as fluid or gel precipitin reactions, 
immunodiffusion {single or double), Immunoelectrophoresis, 
radioimmunoassays (RIAs) , enzyme-linked immunosorbent assays 
(ELISAs) , immunofluorescent assays, and the like. The 
detection of nucleic acids proceeds by well known methods such 
as Southern analysis, northern analysis, gel electrophoresis, 
PCR, radiolabeling. scintillation counting, and affinity 
chromatography . 

A variety of methods of specific DNA and RNA 
measurement using nucleic acid hybridization techniques are 
known to those of skill in the art. For example, one method 
for evaluating the presence or absence of engineered GFP or 
BFP DNA in a sample involves a Southern transfer. Southern et 
al. (1975), J. Mol. Biol. 98:503. Briefly, the digested 
genomic DNA is run on agarose slab gels in buffer and 
transferred to membranes. Hybridization is carried out using 
the probes discussed above. Visualization of the hybridized 
portions allows the qualitative determination of the presence 
or absence of engineered GFP or BFP genes. 
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Similarly, a Northern transfer may be used for the 
detection of engineered GFP or BFP mRNA in samples of RNA from 
cells expressing the engineered GFP or BFP gene. In brief, 
the mRNA is isolated from a given cell sample using an acid 
5 guanidinium-phenol- chloroform extraction method. The mRNA is 
then electrophoresed to separate the mRNA species and the mRNA 
is transferred from the gel to a nitrocellulose membrane. As 
with the Southern blots, labeled probes are used to identify 
the presence or absence of the engineered GFP or BFP 

10 transcript. 

The selection of a nucleic acid hybridization format 
is not critical. A variety of nucleic acid hybridization 
formats are known to those skilled in the art. For example, 
common formats include sandwich assays and competition or 

15 displacement assays. Hybridization techniques are generally 
described in "Nucleic Acid Hybridization, A Practical 
Approach, " Ed. Hames, B.D. and Higgins, S.J., IRL Press, 1985; 
Gall and Pardue (1969), Proc. Natl. Acad. Sci . USA 63:378-383; 
and John, Burnsteil and Jones (1969), Nature 223:582-587. 

20 For example, sandwich assays are commercially useful 

hybridization assays for detecting or isolating nucleic acid 
sequences. Such assays utilize a "capture" nucleic acid 
covalently immobilized to a solid support and labelled 
"signal" nucleic acid in solution. The clinical sample will 

25 provide the target nucleic acid. The "capture" nucleic acid 
and "signal" nucleic acid probe hybridize with the target 
nucleic acid to form a "sandwich" hybridization complex. To 
be effective, the signal nucleic acid cannot hybridize with 
the capture nucleic acid. 

30 The nucleic acid sequences used in this invention 

can be either positive or negative probes. Positive probes 
bind to their targets and the presence of duplex formation is 
evidence of the presence of the target. Negative probes fail 
to bind to the suspect target and the absence of duplex 

35 formation is evidence of the presence of the target. For 

example, the use of a wild type specific nucleic acid probe or 
PCR primers may act as a negative probe in an assay sample 
where only the mutant engineered GFP or BFP is present. 
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Labelled signal nucleic acids, whether those 
described herein or others known in the art are used to detect 
hybridization. Complementary nucleic acids or signal nucleic 
acids may be labelled by any one of several methods typically 
used to detect the presence of hybridized polynucleotides. 
One common method of detection is the use of autoradiography 
with 3 H, 125 I, 35 s, 14 C, or 32 P-labelled probes or the like. 
Other labels include ligands which bind to labelled 
antibodies, f luorophores , chemiluminescent agents, enzymes, 
and antibodies which can serve as specific binding pair 
members for a labelled ligand. 

Detection of a hybridization complex may require the 
binding of a signal generating complex to a duplex of target 
and probe polynucleotides or nucleic acids. Typically, such 
binding occurs through ligand and anti-ligand interactions as 
between a ligand-conjugated probe and an anti-ligand 
conjugated with a signal. The binding of the signal 
generation complex is also readily amenable to accelerations 
by exposure to ultrasonic energy. 

The label may also allow indirect detection of the 
hybridization complex. For example, where the label is a 
hapten or antigen, the sample can be detected by using 
antibodies. In these systems, a signal is generated by 
attaching fluorescent or enzyme molecules to the antibodies or 
in some cases, by attachment to a radioactive label. 
(Tijssen, P. (1985), "Practice and Theory of Enzyme 
Immunoassays," Laboratory Techniques in Biochemistry and 
Molecular Biology, Burdon, R.H., van Knippenberg, P.H., Eds., 
Elsevier, pp. 9-20.) 

The sensitivity of the hybridization assays may be 
enhanced through use of a nucleic acid amplification system 
which multiplies the target nucleic acid being detected. In 
vitro amplification techniques suitable for amplifying 
sequences for use as molecular probes or for generating 
nucleic acid fragments for subsequent subcloning are known. 
Examples of techniques sufficient to direct persons of skill 
through such in vitro amplification methods, including the 
polymerase chain reaction (PCR) the ligase chain reaction 
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(LCR) , Q0-replicase amplification and other RNA polymerase 
mediated techniques (e.g., NASBA) are found in Berger, 
Sambrook, and Ausubel, as well as Mullis et al . (1987), U.S. 
Patent No. 4,683,202; PCR Protocols A Guide to Methods and 
5 Applications (Innis et al., eds) Academic Press Inc. San 
Diego, CA (1990) (Innis); Arnheim & Levinson (October 1, 
1990), Chem. Eng. News 36-47; J. NIH Res. (1991) 3:81-94; 
(Kwoh et al. (1989), Proc. Natl. Acad. Sci. USA 86:1173; 
Guatelli et al. (1990), Proc. Natl. Acad. Sci. USA 87:1874; 

10 Lomell et al . (1989), J*. Clin. Chem. 35:1826; Landegren et al . 

(1988), Science 241:1077-1080; Van Brunt (1990), Biotechnology 
8:291-294; Wu and Wallace (1989), Gene 4:560; Barringer et al . 
(1990), Gene 89:117, and Sooknanan and Malek (1995), 
Biotechnology 13:563-564. Improved methods of cloning 

15 in vitro amplified nucleic acids are described in Wallace et 
al., U.S. Pat. No. 5,426,03 9. Other methods recently- 
described in the art are the nucleic acid sequence based 
amplification (NASBA™, Cangene, Mississauga, Ontario) and Q 
Beta Replicase systems. These systems can be used to directly 

20 identify mutants where the PCR or LCR primers are designed to 
be extended or ligated only when a select sequence is present. 
Alternatively, the select sequences can be generally amplified 
using, for example, nonspecific PCR primers and the amplified 
target region later probed for a specific sequence indicative 

25 of a mutation. 

Oligonucleotides for use as probes, e.g., in in 
vitro amplification methods, for use as gene probes, or as 
inhibitor components are typically synthesized chemically 
according to the solid phase phosphoramidite triester method 

30 described by Beaucage and Caruthers (1981), Tetrahedron Letts. 
22 (20) : 1859-1862 , e.g., using an automated synthesizer, as 
described in Needham-VanDevanter et al. (1984), Nucleic Acids 
Res. 12:6159-6168. Purification of oligonucleotides, where 
necessary, is typically performed by either native acrylamide 

35 gel electrophoresis or by anion -exchange HPLC as described in 
Pearson and Regnier (1983), J". Chrom. 255:137-149. The 
sequence of the synthetic oligonucleotides can be verified 
using the chemical degradation method of Maxam and Gilbert 
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(1980) in Grossman and Moldave (eds.) Academic Press, New 
York, Methods in Enzymology 65:499-560. 

An alternative means for determining the level of 
expression of the engineered GFP or BFP gene is in situ 
hybridization. In situ hybridization assays are well known 
and are generally described in Angerer et al . (1987) , Methods 
Enzymol. 152:649-660. In an in situ hybridization assay cells 
are fixed to a solid support, typically a glass slide. If DNA 
is to be probed, the cells are denatured with heat or alkali. 
The cells are then contacted with a hybridization solution at 
a moderate temperature to permit annealing of engineered GFP 
or BFP specific probes that are labelled. The probes are 
preferably labelled with radioisotopes or fluorescent 
reporters . 

B. Fluorescence Assay 

When a fluorophore such as protein that is capable 
of fluorescing is exposed to a light of appropriate 
wavelength, it will absorb and store light and then release 
the stored light energy. The range of wavelengths that a 
fluorophore is capable of absorbing is the excitation spectrum 
and the range of wavelengths of light that a fluorophore is 
capable of emitting is the emission or fluorescence spectrum. 
The excitation and fluorescence spectra for a given 
fluorophore usually differ and may be readily measured using 
known instruments and methods. For example, scintillation 
counters and photometers (e.g. luminometers) , photographic 
film, and solid state devices such as charge coupled devices, 
may be used to detect and measure the emission of light. 

The nucleic acids, vectors, mutant proteins provided 
herein, in combination with well known techniques for over- 
expressing recombinant proteins, make it possible to obtain 
unlimited supplies of homogeneous mutant GFPs and BFPs . These 
modified GFPs or BFPs having increased fluorescent activity 
replace wtGTP or other currently employed tracers in existing 
diagnostic and assay systems. Such currently employed tracers 
include radioactive atoms or molecules and color-producing 
enzymes such as horseradish peroxidase. 
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The benefits of using the mutants of the present 
invention are at least four- fold: the modified GFPs and BFPs 
are safer than radioactive-based assays, modified GFPs and 
BFPs can be assayed quickly and easily, and large numbers of 
5 samples can be handled simultaneously, reducing overall 

handling and increasing efficiency. Of great significance, 
the expression and subcellular distribution of the fluorescent 
proteins within cells can be detected in living tissues 
without any other experimental manipulation than to placing 
10 the cells on a slide and viewing them through a fluorescence 
microscope. This represents a vast improvement over methods 
of immunodetection that require fixation and subsequent 
labelling . 

The modified GFPs and BFPs of the present invention 

15 can be used in standard assays involving a fluorescent marker. 
For example, ligand- ligator binding pairs that can be modified 
with the mutants of the present invention without disrupting 
the ability of each to bind to the other can form the basis of 
an assay encompassed by the present invention. These and 

20 other assays are known in the art and their use with the GFPs 
and BFPs of the present invention will become obvious to one 
skilled in the art in light of the teachings disclosed herein. 
Examples of such assays include competitive assays wherein 
labeled and unlabeled ligands competitively bind to a ligator, 

25 noncompetitive assay where a ligand is captured by a ligator 
and either measured directly or "sandwiched" with a secondary 
ligator that is labeled. Still other types of assays include 
immunoassays, single-step homogeneous assays, multiple-step 
heterogeneous assays, and enzyme assays. 

30 In a number of embodiments, the mutant GFPs and BFPs 

are combined with fluorescent microscopy using known 
techniques (see, e.g., Stauber et al . , Virol. 213:439-454 
(1995)) or preferably with fluorescence activated cell sorting 
(FACS) to detect and optionally purify or clone cells that 

35 express specific recombinant constructs. For a brief overview 
of the FACS and its uses, see: Herzenberg et al . , 1976, 
"Fluorescence activated cell sorting", Sci. Ainer. 234, 108; 
see also Flow Cytometry and Sorting, eds . Melamad, Mullaney and 
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Mendelsohn, John Wiley and Sons, Inc., New York, 1979). 
Briefly, fluorescence activated cell sorters take a suspension 
of cells and pass them single file into the light path of a 
laser placed near a detector. The laser usually has a set 
wavelength. The detector measures the fluorescent emission 
intensity of each cell as it passes through the instrument and 
generates a histogram plot of cell number versus fluorescent 
intensity. Gates or limits can be placed on the histogram 
thus identifying a particular population of cells. In one 
embodiment, the cell sorter is set up to select cells having 
the highest probe intensity, usually a small fraction of the 
cells in the culture, and to separate these selected cells 
away from all the other cells. The level of intensity at 
which the sorter is set and the fraction of cells which is 
selected, depend on the condition of the parent culture and 
the criteria of the isolation. In general, the operator 
should first sort an aliquot of the culture, and record the 
histogram of intensity versus number of cells. The operator 
can then set the selection level and isolate an appropriate 
number of the most active cells. Currently, fluorescence 
activated cell sorters are equipped with automated cell 
cloning devices. Such a device enables one to instruct the 
instrument to singly deposit a selected cell into an 
individual growth well, where it is allowed to grow into a 
monoclonal culture. Thus, genetic homogeneity is established 
within the newly cloned culture. 

IV. General Applications for the GFP Mutants 

It should be self-evident that the mutant GFP and 
BFP sequences described here have unlimited uses, particularly 
as signal or reporter sequences for the co-expression of other 
nucleic acid sequences of interest and/or to track the 
location and/or movement of other sequences within the cell, 
within tissue and the like. For example, these reporter type 
sequences could be used to track the spread (or lack thereof) 
of a disease causal agent in drug screening assays or could 
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readily be used in diagnostics. Some of the more interesting 
applications are described below. 

A. Protein Trafficking 

5 Normally, expressed mutant GFPs and BFPs are 

distributed throughout the cell (particularly mammalian 
cells), except for the nucleolus. However, as described 
below, when a GFP mutant is fused to the HIV-1 Rev protein, a 
hybrid molecule results which retains the Rev function and is 
10 localized mainly in the nucleolus where Rev is found. Fusion 
to the N- terminal domain of the HIV-1 Nef protein produces a 
hybrid protein detectable in the plasma membrane. Thus, the 
GFP mutants can be used to monitor the subcellular targeting 
and transport of proteins to which they are fused. 

15 

B • Gene Therapy 

The mutant GFPs described here have interesting and 
useful applications in gene therapy. Gene therapy in general 
is the correction of genetic defects by insertion of exogenous 

20 cellular genes that encode a desired function into cells that 
lack that function, such that the expression of the exogenous 
gene a) corrects a genetic defect or b) causes the destruction 
of cells that are genetically defective. Methods of gene 
therapy are well known in the art, see, for example, Lu, M . , 

25 et al. (1994), Human Gene Therapy 5:203 ; Smith, C. (1992), J . 

Hematotherapy 1:155; Cassel, A., et al . (1993), Exp. Hematol. 
21-:585 (1993); Larrick, J.W. and Burck, K.L. , Gene Therapy : 
Application of Molecular Biology, Elsevier Science Publishing Co., 
Inc., New York, New York (1991) and Kreigler, M. Gene Transfer 

30 and Expression: A Laboratory Manual, W.H. Freeman and Company, New 
York (1990), each incorporated herein by reference. One 
modality of gene therapy involves (a) obtaining from a patient 
a viable sample of primary cells of a particular cell type; 
(b) inserting into these primary cells a nucleic acid segment 

35 encoding a desired gene product; (c) identifying and isolating 
cells and cell lines that express the gene product; (d) re- 
introducing cells that express the gene product; (e) removing 
from the patient an aliquot of tissue including cells 
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resulting from seep c and their progeny; and (f) determining 
the quantity of the cells resulting from step c and their 
progeny, in said aliquot. The introduction into cells in step 
c of a polycistronic vector that encodes GFP or BFP in 
addition to the desired gene allows for the quick 
identification of viable cells that contain and express the 
desired gene. 

Another gene therapy modality involves inserting the 
desired nucleic acid into selected tissue cells In situ, for 
example into cancerous or diseased cells, by contacting the 
target cells in situ with retroviral vectors that encode the 
gene product in question. Here, it is important to quickly 
and reliably assess which and what proportion of cells have 
been transfected. Co-expression of GFP and BFP permits a 
quick assessment of proportion of cells that are transfected, 
and levels of expression. 

C. Diagnostics 

One potential application of the GFP/BFP variants is 
in diagnostic testing. The GFP/BFP gene, when placed under 
the control of promoters induced by various agents, can serve 
as an indicator for these agents. Established cell lines or 
cells and tissues from transgenic animals carrying GFP/BFP 
expressed under the desired promoter will become fluorescent 
in the presence of the inducing agent . 

Viral promoters which are transact ivated by the 
corresponding virus, promoters of heat shock genes which are 
induced by various cellular stresses as well as promoters 
which are sensitive to organismal responses, e.g. 
inflammation, can be used in combination with the described 
GFP/BFP mutants in diagnostics. 

In addition, the effect of selected culture 
conditions and components (salt concentrations, pH, 
temperature, trans-acting regulatory substances, hormones, 
cell-cell contacts, ligands of cell surface and internal 
receptors) can be assessed by incubating cells in which 
sequences encoding the fluorescent proteins provided herein 
are operably linked to nucleic acids (especially regulatory 
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elements such as promoters) derived from a selected gene, and 
detecting the expression and location of fluoresence. 

D. Toxicology 

Another application of the GFP/BFP-based 
methodologies is in the area of toxicology. Assessment of the 
mutagenic potential of any compound is a prerequisite for its 
use. Until recently, the Ames assay in Salmonella and tests 
based on chromosomal aberrations or sister chromatid exchanges 
in cultured mammalian cells were the main tools in toxicology. 
However, both assays are of limited sensitivity and 
specificity and do not allow studies on mutation induction in 
various organs or tissues of the intact organism. 

The introduction of transgenic mice with a 
mutational target in a shuttle vector has made possible the 
detection of induced mutations in different tissues in vivo. 
The assay involves DNA isolation from tissues of exposed mice, 
packaging of the target DNA into bacteriophage lambda 
particles and subsequent infection of E. coli. The mutational 
target in this assay is either the lacZ or lad genes and 
quantitation of blue vs white plaques on the bacterial lawn 
allows for mutagenic assessment. 

GFP/BFP could significantly simplify both the tissue 
culture and transgenic mouse procedures. Expression of 
GFP/BFP under the control of a repressor, which in turn is 
driven by the promoter of a constitutively expressed gene, 
will establish a rapid method for evaluating the mutagenic 
potential of an agent. The presence of fluorescent cells, 
following exposure of a cell line, tissue or whole animal 
carrying the GFP/BFP-based detection construct, will reflect 
the mutagenicity of the compound in question. GFP/BFP 
expressed under the control of the target DNA, the repressor 
gene, will only be synthesized when the repressor is 
inactivated or turned off or the repressor recognition 
sequences are mutated. Direct visualization of the detector 
cell line or tissue biopsy can qualitatively assess the 
mutagenicity of the agent, while FACS of the dissociated cells 
can provide for quantitative analysis. 
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E. Drug Screening 

The GFP/BFP detection system could also 
significantly expedite and reduce the cost of some current 
drug screening procedures. A dual color screening system 
(DCSS) , in which GFP is placed under the promoter of a target 
gene and BFP is expressed from a constitutive promoter, could 
provide for rapid analysis of agents that specifically affect 
the target gene. Established cell lines with the DCSS could 
be screened with hundreds of compounds in few hours. The 
desired drug will only influence the expression of GFP. 
Non-specific or cytotoxic effects will be detected by the 
second marker, BFP. The advantages of this system are that no 
exogenous substances are required for GFP and BFP detection, 
the assay can be used with single cells, cell populations, or 
cell extracts, and that the same detection technology and 
instrumentation is used for very rapid and non-destructive 
detection . 

The search for antiviral agents which specifically 
block viral transcription without affecting cellular 
transcription, could be significantly improved by the DCSS. 
In the case of HIV, appropriate cell lines expressing GFP 
under the HIV LTR and BFP under a cellular constitutive 
promoter, could identify compounds which selectively inhibit 
HIV transcription. Reduction of only the green but not the 
blue fluorescent signal will indicate drug specificity for the 
HIV promoter. Similar approaches could also be designed for 
other viruses . 

Furthermore, the search for antiparasitic agents 
could also be helped by the DCSS. Established cell lines or 
transgenic nematodes or even parasitic extracts where 
expression of GFP depends on parasite-specific trans splicing 
sequences while BFP is under the control of host -specific cis 
splicing elements, could provide for rapid screen of selective 
antiparasitic drugs. 

The invention will be more readily understood by 
reference to the following specific examples which are 
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included for purposes of illustration only and are not 
intended to limit the invention unless so stated. 

EXAMPLES 

The following general protocol was used to generate 
mutant GFP- or BFP-encoding nucleic acids, transform host 
cells, and express the mutant GFP and BFP proteins: 

• Clone a nucleic acid that encodes either wtGFP or 
BFP (Tyr 67 -*His) , under the control of eukaryotic or 
prokaryotic promoters, into a standard ds-DNA plasmid 

• Convert the plasmid vector to a ss-DNA by standard 
methods 

• Anneal the ss-DNA to 40-50 nucleotide DNA oligomers 
having base mismatches at the site(s) intended to be 
engineered 

• Convert the ss-DNA to a closed ds-DNA plasmid vector by 
use of DNA polymerase and standard protocols 

• Identify plasmids containing the desired mutations by 
restriction analysis following plasmid DNA isolation from 
E. coli strains transformed with the mutagenized DNA 

• verify the presence of mutations by DNA sequencing 

• transfect human transformed embryonic kidney 293 cells 
with equal amounts of DNA from the appropriate plasmids 

• compare the fluorescence intensity of the signals 

Nucleic acida and vectors 

The wtGFP cDNA (SEQ ID NO:l) was obtained from Dr. 
Chalfie of Columbia University. All mutants described were 
obtained by modifying this wtGFP sequence as detailed below. 

The vectors used to clone and to express the GFPs 
and BFPs are derivatives of the commercially available 
plasmids pcDNA3 (Invitrogen, San Diego, CA) , pBSSK+ 
(Stratagene, La Jolla, CA) and pETlla (Novagen, Madison, WI) . 
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wtGFP protein expression i n mamm alian cells 

Several vectors for the expression of GFP in 
mammalian cells were constructed: 

pFRED4 carries the wtGFP sequences under the control of the 
cytomegalovirus (CMV) early promoter and the polyadenylation 
signal of the Human Immunodeficiency Virus-1 (HIV) 3* Long 
Terminal Repeat (LTR) . To derive pFRED4 we amplified the GFP 
coding sequence from plasmid #TU58 (Chalfie et al. f 1994) by 
the polymerase chain reaction (PCR) . For PCR amplification of 
the GFP coding region, oligonucleotides #16417 and #16418 were 
used as primers. Oligonucleotide #16417: 
5* -GGAGGCGCGCAAGAAATGGCTAGCAAAGGAGAAGA-3 » (SEQ ID NO: 3) , 
containing the BssHII recognition sequence and the translation 
initiation sequence of the HIV-1 Tat protein, was the sense 
primer. The antisense primer, #16418: 

5' -GCGGGATCCTTATTTGTATAGTTCATCCATGCCATG- 3 1 (SEQ ID NO : 4 ) 
contained the BamHI recognition sequence. The amplified 
fragment was digested with BssHII and BamHI and cloned into 
BssHII and BamHI digested pCMV37Ml- 10D, a plasmid containing 
the CMV early promoter and the HIV-1 p37gag region, followed 
by several cloning sites and the HIV-1 3' LTR. Thus the 
p37gag gene was replaced by GFP, resulting in pFRED4 . 

In a second step, the 1485bp fragment from pFRED4 , 
generated from StuI and BamHI double digestion, was subcloned 
into the 4 74 7bp vector derived from the Nrul and BamHI double 
digestion of pcDNA3 . The resulting plasmid, pFRED7 (SEQ ID 
NO:5), expresses GFP under the control of the early CMV 
promoter and the bovine growth hormone polyadenylation signal. 

Bacterial express ion 

For bacterial expression, we constructed plasmid 
pBSGFP (SEQ ID N0:6), a pBSSK+ derivative carrying wtGFP. 
pBSGFP was generated by inserting the GFP containing region of 
pFRED4, digested with BamHII and BamHI and subsequently 
treated with Klenow, into the EcoRV digested pBSSK+ vector. 
In pBSGFP the wtGFP is fused downstream to the 4 3 amino acids 
of the alpha peptide of beta galactosidase , present in the 
pBSSK+ polylinker region. The added amino acids at the 
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N- terminus of wtGFP have no apparent effect on the GFP signal, 
as judged from subsequent plasmids containing precise 
deletions of the extra amino acids. 

For GFP overexpression and purification we generated 
plasmid p FRED 13 (SEQ ID NO: 7) by ligating the 717bp fragment 
from pFRED7 digested with Nhel and BamHI, to the 5644bp 
fragment resulting from the Nhel and BamHI double digestion of 
pETlla. In pFRED13 , GFP is synthesized under the control of 
the bacteriophage T7 philO promoter. 

The oligonucleotides used for GFP mutagenesis were 
synthesized by the DNA Support Services of the ABL Basic 
Research Program of the National Cancer Institute. DNA 
sequencing was performed by the PCR-assisted fluorescent 
terminator method (ReadyReaction DyeDeoxy Terminator Cycle 
Sequencing Kit, ABI , Columbia, MD) according to the 
manufacturer's instructions. Sequencing reactions were 
resolved on the ABI Model 373A DNA Sequencing System. 
Sequencing data were analyzed using the Sequencher program 
(Gene Codes, Ann Arbor, MI) . 

Enzymes were purchased from New England Biolabs 
(Beverly, MA) and used according to conditions described by 
the supplier. Chemicals used for the purification of wild 
type and mutant proteins were purchased from SIGMA (St. Louis, 
MO) . Tissue culture media were obtained from Biof luids 
(Rockville, MD) and GIBCO/BRL (Gaithersburg , MD) . Competent 
bacterial cells were purchased from GIBCO/BRL. 

Preparation of mutants 

Initially, plasmid pBSGFP was used to mutagenize the 
GFP coding sequence by single-stranded DNA site directed 
mutagenesis, as described by Schwartz et al . (1992) J. Virol. 
66:7176. In addition to changing specific codons, our 
strategy was also to improve GFP expression by replacing 
potential inhibitory nucleotide sequences without altering the 
GFP amino acid sequence. This approach has been successfully 
employed in the past for other proteins (Schwartz et al . 
(1992) J. Virol. 66:7176). 
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For the pBSGFP mutagenesis the following 
oligonucleotides were used: 
#17422 (SEQ ID NO:8) : 

5 ' -CAATTTGTGTCCCAGAATGTTGCCATCTTCCTTGAAGTCAATACCTTT-3 • 
#17423 (SEQ ID NO: 9) : 

5 » - GTCTTGTAGTTGCCGTCATCTTTGAAGAAGATGCTCCTTTCCTGTAC- 3 1 
#17424 (SEQ ID NO: 10) : 

5 1 - CATGGAACAGGCAGTTTGCCAGTAGTGCAGATGAACTTCAGGGTAAGTTTTC - 3 ' 
#17425 (SEQ ID NO:ll) : 

5 ■ -CTCCACTGACAGAGAACTTGTGGCCGTTAACATCACCATC-3 1 
#17426 (SEQ ID NO:12) : 

5 1 - CCATCTTCAATGTTGTGGCGGGTCTTGAAGTTCACTTTGATTCCATT- 3 1 
#17465 (SEQ ID NO: 13) : 

5 ' - CGATAAGCTTGAGGATCCTCAGTTGTACAGTTCATCCATGC- 3 ■ 

Oligonucleotide #17426 introduces a mutation in GFP, 
converting the Isoleucine (lie) at position 168 into Threonine 
(Thr) . The llel68Thr change has been shown to alter the GFP 
spectrum and to also increase the intensity of GFP 
fluorescence by almost two-fold at the emission maxima (Heim 
et al. (1994) , supra) . 

The mutagenesis mixture was used to transform DH5a 
competent E. coli cells. Ampicilin resistant colonies were 
obtained and examined for their fluorescent properties by 
excitation with UV light. One colony, significantly brighter 
than the rest, was apparent on the agar plate. This colony 
was further purified, the plasmid DNA was isolated and used to 
transform DH5a competent bacteria. This time all the colonies 
were bright green when excited with the UV light, indicating 
that the bright green fluorescence was associated with the 
presence of the plasmid. The sequence of the GFP segment 
(SEQ ID NO: 14, representing only the segment and not the whole 
plasmid) of this plasmid, called pBSGFPsgll, was then 
determined. The sequence analysis revealed that in addition 
to the designed nucleotide changes, which do no alter the 
amino acid sequence of GFP, and the Ilel68Thr mutation, a 
second spontaneous mutation had occurred. A thymidine at 
position 322 of SEQ ID NO: 14, which is the GFP-coding region 
of the pPBSGFPsgll DNA, was replaced by a cytosine. This 
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nucleotide change converts the phenylalanine (Phe) at position 
65 of the GFP amino acid sequence into a leucine (Leu) . A 
series of experiments, which will be described below, 
demonstrated that indeed the Phe65Leu mutation was responsible 
for the increase in the intensity of the fluorescent GFP 
signal . 

In subsequent experiments, involving generation of 
rationally designed GFP mutant combinations to be detailed 
below, we also used the single-stranded DNA site directed 
mutagenesis approach. This time, however, the template DNAs 
were pFRED7 derivatives instead of pBSGFP. 

Trans feet ion and expression 

The 293 cell line, an adenovirus- transformed human 
embryonal kidney cell line (Graham et al . (1977), J. Gen. 
Virol. 5:59) was used for protein expression analysis. The 
cells were cultured in Dulbecco's modified culture medium 
(DMEM) supplemented with 10% heat -inactivated fetal bovine 
serum (FBS, Biofluids) . 

Transfection was performed by the calcium phosphate 
coprecipitation technique as previously described (Graham et 
al. (1973), Virol. 52:456; Felber et al. (1990), J. Virol. 
64:3734. Plasmid DNA was purified by Qiagen columns according 
to the manufacturer's instructions (Qiagen). A mix of 5 to 10 
pig of total DNA per ml of final precipitate was overlaid on 
the cells in 60 mm or 6- and 12 -well tissue culture plates 
(Falcon), using 0.5, 0.25 and 0.125 ml of precipitate, 
respectively. After overnight incubation, the cells were 
washed, placed in medium without phenol red and measured in a 
plate spectrof luorometer , e.g., Cytofluor II (Perceptive 
Biosys terns, Framingham, MA.) 

Purification of wild- type and mutant proteins: 

E. coli strains carrying pFRED13 or other pETlla 
derivatives with mutant GFP genes were used for the 
overproduction and purification of the wt and mutant GFPs or 
BFPS. The cells were grown in 1 liter LB broth containing 100 
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/ig/ml ampicillin at 32° C to a density of 0.6*0.8 optical 
density units at 600 nm. At this point, the cells were 
induced with 0.6 mM IPTG and incubated for four more hours. 
Following harvesting of the cell pellets, cellular extracts 
were prepared as described by Johnson, B.H and Hecht, M.H., 
1994 , Biotechnol. 12: 1357. 

GFPs and BFPs were purified from the cellular 
extracts as follows; Ammonium sulfate (AS) was added first to 
the extracts (50g AS per lOOg supernatant) to precipitate the 
proteins. The precipitants were collected by centrif ugation 
at 7500 x g for 15 min and the pellets were dissolved in 5ml 
of 1 M AS. The samples were then loaded on phenylsepharose 
column (HR10/10, Pharmacia, Piscataway, NJ) and washed with 20 
mM 2- (N-morpholino) ethanesulf onic Acid (MES) pH 5.6 and 1 M 
AS. Proteins were eluted with a 45 ml gradient to 20 mM MES, 
pH 5.6. Fractions containing the GFP or BFP protein were 
colored even under visible light . 

Green or blue -colored fractions were further 
purified on Q-sepharose (Mono Q, HR5/5, Pharmacia) with a 20 
ml gradient from 20 mM Tris pH 7.0 to 20 mM Tris pH 7.0, 0.25 
M NaCl. 

The AS precipitation step was performed at 4° C 
while the chromatographic procedures were performed at room 
temperature . 

Determination of protein concentration 

Protein concentrations were determined using the 
commercially available Bradford protein assay (BioRad, 
Hercules, CA) with bovine IgG protein as a standard. 

Analyti cal polvacry l am ide gels 

Analytical polyacrylamide gel electrophoresis was 
used to visualize the degree of purity of the purified GFP or 
BFP proteins. In all cases, 1 mm thick, 12% acrylamide gels 
(containing 0.1% SDS, in Tris buffer, pH 7.4) were used, and 
electrophoresis was performed for 2 hours at 120 V. Gels were 
stained with Coomassie Blue to visualize the proteins. 
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Fluorescence measurements 

Excitation and emission spectra of solutions of the 
fluorescent proteins were obtained using a Perkin Elmer L550B 
spectrof luorimeter (Perkin Elmer, Advanced Biosystems, Foster 
City, , CA) . 

The relative fluorescence data for the GFP mutants 
in Table I below were obtained by comparing the cellular 
fluorescence of the GFP mutants expressed in the transformed 
human embryonic kidney cell line 293 with wtGFP expressed in 
the same cell line. Likewise, the relative fluorescence data 
for the BFP mutants in Table I below were obtained by 
comparing the cellular fluorescence of the BFP mutants 
expressed in 293 cells with BFP (Tyr 67 -*His) expressed in the 
same cell line. Equal amounts of DNA encoding wild type or 
mutant proteins were introduced into 293 cells. Cellular 
fluorescence was quantified 24 h or 48 hr. post - transf ection 
using Cytofluor II. 

A list of GFP mutant proteins indicating the 
introduced amino acid mutations is shown in Table I. 



TABLE I: GFP and BFP mutants 



PROTEIN 


Amino Acid Position 


65 


66 


67 


164 


168 


239 


wt GFP 


F 


s 


Y 


V 


I 


K 


SG12 


L 












SG11 


L 1 








T 


N 


SG25 


L 


c 






T 


N 


BFP 






H 








SB42 


L 




H 








SB49 






H 


A 






SB50 


l ; 




H 


A 







Example 1; SG12 

A number of the unique mutants described herein 
derive from the discovery of an unplanned and unexpected 
mutation called M SG12 n , obtained in the course of site- 
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directed mutagenesis experiments, wherein a phenylalanine at 
position 65 of wtGFP was converted to leucine. SG12 was 
prepared as follows: Two plasmids carrying SG12 (SEQ ID NO: 15) 
were generated, pFRED12 for expression in mammalian cells, and 
PFRED16 for expression in E. coli and protein purification. 
PFRED12 was constructed by ligating the 1557 bp fragment from 
the double digestion of pFRED7 with Avr II and Pml I into the 
4681 bp fragment generated from the Avr II and Pml I digestion 
of pFREDll (see below) . pFRED16 was derived by subcloning the 
717bp segment resulting from the digestion of pFRED12 with 
Nhel and BamHI to the 5644bp fragment of the pETlla vector 
digested with the same restriction enzymes. 

The specific activity of SG12 was about 9-12 times 
that of wtGFP. See Table II. 



Example 2; SG11 

A mutant referred to as "SG11," which combined the 
phenylalanine 65 to leucine alteration with an isoleucine 168 
to threonine substitution and a lysine 23 9 to asparagine 
susbstitution, gave a further enhanced fluorescence intensity. 
SG11 was prepared as follows: Two plasmids carrying SG11 (SEQ 
ID NO: 16) were generated: pFREDll for expression in mammalian 
cells and pFRED15 for expression in E. coli and protein 
purification. pFREDll was constructed by ligating the 717bp 
region from pBSGFPsgll DNA digested with Nhel and BamHI to the 
5221bp fragment derived from the digestion of pFRED7 with the 
same enzymes. pFRED15 was generated by subcloning the 717bp 
segment resulting from the digestion of pFREDll with Nhel and 
BamHI to the 5644 bp fragment of the pETlla vector, digested 
with the same restriction enzymes. 

The mutant SG11 encodes an engineered GFP wherein 
the alteration comprises the conversion of phenylalanine 65 to 
leucine and the conversion of isoleucine 168 to threonine. 
The additional alteration of the C- terminal lys 23 9 to asn is 
without effect; the C- terminal lys or asn may be deleted 
without affecting fluorescence. The specific activity of SGll 
is about 19-38 times that of wtGFP. See Table II. 



WO 97/42320 



PCT/US97/07625 



50 

Example 3: SG2 5 

A third and further improved GFP mutant was obtained 
by further mutating "SGll." This mutant is referred to as 
"SG25" and comprises, in addtion to the SG11 substitutions, 
and additional substitution of a cysteine for the serine 
normally found at position 66 in the sequence. SG11 was 
prepared as follows: Two plasmids carrying SG25 (SEQ ID NO: 17) 
were generated: pFRED25 for expression in mammalian cells and 
pFRED63 for expression in E. coli and protein purification. 
pFRED2 5 was constructed by site directed mutagenesis of 
pFREDll , using oligonucleotide #18217 (SEQ ID NO:18): 
5 ' - CATTG AACACC AT AG CACAGAGTAGTG ACTAGTGTTGG C C - 3 * . This 
oligonucleotide incorporates the Ser66Cys mutation into SG11. 
Ser66Cys had been shown to both alter the GFP excitation 
maxima without significant change in the emission spectrum and 
to also increase the intensity of the fluorescent signal of 
GFP (Heim et al . , 1995). 

PFRED63 was generated by subcloning the 717 bp 
segment resulting from the digestion of pFRED25 with Nhel and 
BamHI to the 5644 bp fragment of the pETlla vector, digested 
with the same restriction enzymes. 

The mutant SG25 encodes an engineered GFP wherein 
the alteration comprises the conversion of phenylalanine 65 to 
leu, the conversion of isoleucine 168 to threonine and the 
conversion of serine 66 to cysteine. As with SG11, the 
additional alteration of the C-terminal lysine 239 to 
asparagine is without effect; the C-terminal lysine or 
aspragine may be deleted without affecting fluorescence. The 
specific activity of SG25 is about 56 times that of wtGFP . 
See Table II. 

Example 4 1 Additional green fluorescent mutants 

Additional alterations at different amino acids of 
the wtGFP, when combined with SG11 and SG25, yielded proteins 
having at least 5X greater cellular fluorescence compared to 
the wtGFP. A non- limiting list of these mutations is provided 
below : 
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GFP variants with enhanced cellular fluorescence 



Protein 


Altered Ami nr> ari He 




SG20 


F65L, 


S66T, 


I168T\ 


K239N 




SG21 


F65L, 


S66A, 


I168T, 


K23 9N 




SG27 


Y40L, 


F65L, 


I168T, 


K239N 




SG30 


F47L, 


F65L, 


I168T, 


K239N 




SG32 


F72L, 


F65L, 


I168T, 


K23 9N 




SG43 


F65L, 


I168T, Y201L, K239N 




SG4 6 


F65L, 


V164A, I168T, K239N 




SG72 


F65L, 


S66C, 


V164A, 


I168T, 


K239N 


SG91 


F65L, 


S66C, 


F100L, 


I168T, 


K239N 


SG94 


F65L, 


S66C, 


Y107L, 


I168T, 


K23 9N 


SG95 


F65L, 


S66C, 


F115L, 


I168T, 


K23 9N 


SG96 


F65L, 


S66C, 


F131L, 


I168T, 


K23 9N 


SG98 


F65L, 


S66C, 


Y146L, 


I168T, 


K23 9N 


SG100 


F65L, 


S66C, 


Y152L, 


I168T, 


K239N 


SGI 01 


F65L, 


S66C, 


I168T, 


Y183L, 


K23 9N 


SGI 02 


F65L, 


S66C, 


I168T, 


F224L, 


K239N 


SGI 03 


F65L, 


S66C, 


I168T, 


Y238L, 


K239N 


SGI 06 


F65L, 


S66T, 


V164A, 


I168T, 


K23 9N 



Example 5; SB42 

The blue fluorescent proteins described here and 
below were derived from the known GFP mutant (Heim et al . , 
PNAS, 19 94) wherein histidine is substituted for tyrosine at 
position 67. We have designated this known mutant 
BFP (Tyr 67 -»His) . BFP (Tyr 67 -»His) has a shifted emission 
spectrum. it emits blue light, i.e., it is a blue fluorescent 
protein (BFP) . 

By introducing the same mutation in BFP (Tyr 67 -»His ) 
that was used to generate SG12, i.e., leucine for 
phenylalanine at position 65, we created a new mutant that has 
unexpectedly high fluorescence that we refer to as "SuperBlue- 
42" (SB42). SB42 was prepared as follows: Two plasmids 
carrying SB42 (SEQ ID NO:19) were generated: pFRED4 2 for 
expression in mammalian cells and pFRED65 for expression in E. 
coli and protein purification. pFRED4 2 was constructed by 
site directed mutagenesis of pFRED12, using oligonucleotide 
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#bio2 5 ( 5 - CATTGAACACCATGAGAGAGAGTAGTGACTAGTGTTGGCC - 3 ■ ) ( SEQ ID 
NO: 20) . This oligonucleotide incorporates the Tyr 67 ->His 
mutation into SG12, thus generating the Phe65Leu, Tyr 67 -»His 
double mutant. 

PFRED65 was created by subcloning the 717 bp segment 
resulting from the digestion of pFRED42 with Nhel and BamHI to 
the 5644 bp fragment of the pETlla vector, digested with the 
same restriction enzymes. 

The mutant SB4 2 encodes an engineered BFP wherein 
the alterations comprise the conversion of tyrosine 67 to 
histidine and the conversion of phenylalanine 65 to leucine. 
The specific activity of SB42 is about 27 times that of 
BFP(Tyr 67 -His) . See Table II. 

Example 6; SB49 

An independent mutation of BFP (Tyr 67 -»His) which 
substitutes the valine at position 164 with an alanine is 
referred to as "SB4 9." SB4 9 was prepared as follows: Plasmid 
p FRED 4 9 expresses SB49 (SEQ ID NO: 21) in mammalian cells. 
pFRED4 9 was generated by site directed mutagenesis of pFRED12 , 
using oligonucleotides #19059 and #bio24. Oligonucleotide 
#19059 (5 1 - CTTCAATGTTGTGG CGG ATCTTGAAGTTCGCTTTGATTCCATTC - 3 ' ) 
(SEQ ID NO: 22) introduces the Vall64Ala mutation in SG12 while 
oligonucleotide tfbio24 (5'- 

CATTGAACACCATGAGAGAAAGTAGTGACTAGTGTTGGCC - 3 ' ) ( SEQ ID NO : 2 3 ) 
reverts the Phe6 5Leu alteration to the wt sequence and, at the 
same time, incorporates the Tyr 67 ->His mutation. 

The mutant SB4 9 encodes an engineered BFP wherein 
the alterations comprise the conversion of tyrosine 67 to 
histidine, and the conversion of valine 164 to alanine. The 
specific activity of SB49 was about 37 times that of 
BFP(Tyr 67 -*His) . See Table II. 

Example 7: SB50 

A combination of the above two BFP mutations 
resulted in "SBSO," which gave an even greater fluorescence 
enhancement than either of the previous mutations. SB50 was 
prepared as follows: Two plasmids carrying SB50 (SEQ ID NO: 
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24) were generated: p FRED 50 for expression in mammalian cells 
and pFRED67 for expression in E. coli and protein 
purification. p FREDS 0 was constructed by site directed 
mutagenesis of pFRED12 , using oligonucleotides #19059 and 
#bio25. 

PFRED67 was created by subcloning the 717bp segment 
resulting from the digestion of pFREDSO with Nhel and BamHI to 
the 5644 bp fragment of the pETlla vector digested with the 
same restriction enzymes. 

The mutant SB50 encodes an engineered BFP wherein 
the alterations comprise the conversion of tyrosine 67 to 
histidine, the conversion of phenylalanine 65 to leucine and 
the conversion of alanine 164 to valine. The specific 
activity of SB50 was about 63 times that of BFP (Tyr 67 -*His) . 
See Table II. 



TABLE IX 



Mutant 


Excitation 

Maximum 

(ma) 


Emission 
Maximum 
(nm) 


Factor of 
increased 
green 

fluorescence 
(at maximum 
emission) as 
compared to 
wtGFP 


Factor of 
increased blue 
fluorescence 
(at maximum 
emission) as 
compared to 
BFP(Tyr $r »His) 


SGI 2 


398 


509 


9-12X 




SG11 


471 


508 


19-38X 




SG25 


473 


509 


50-100X 




SB4 2 


387 


450 




27X 


SB4 9 


387 


450 




37X 


SB50 


387 


450 




63X 



The dramatic increase in fluorescent activity 
resulting from the amino acid substitutions of the present 
invention was wholly unexpected. The cellular fluorescence of 
the mutants was at least five times greater, and usually over 
twenty times greater, than that of the parent wtGFP or 
BFP(Tyr 67 ->His) . Note that the maximum emission wavelengths 
vary among the mutants, and that the above -reported fold 
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increases refer only to minimal increases in relative cellular 
fluorescence at the maximum emission wavelength of the mutant. 
Given a particular wavelength, the values may be substantially 
larger, i.e., the mutants may have a 200-fold greater cellular 
5 fluorescence than the reference wtGTP or BFP (Tyr 67 -»His) . This 
is important because devices for measuring fluorescence often 
have set wavelengths, or the limitations of a given experiment 
often require the use of a set wavelength. Thus, for example, 
the emission and detection parameters of a fluorescence 
10 microscope or a fluorescence-activated cell sorter may be set 
for a wavelength wherein the cellular fluorescence of a given 
mutant is 2 00 -fold greater than that of the known GFPs and 
BFPs. 

The GFP and BFP mutants of this . invention, in 
15 contrast to the wild type protein or other reported mutants, 
allow detection of green fluorescence in living mammalian 
cells when present in few copies stably integrated into the 
genome. This high cellular fluorescence of the mutant GFPs 
and BFPs is useful for rapid and simple detection of gene 
20 expression in living cells and tissues and for repeated 
analysis of gene expression over time under a variety of 
conditions. They are also useful for the construction of 
stable marked cell lines that can be quickly identified by 
fluorescence microscopy or fluorescence activated cell 
25 sorting. 

Example 8 

We have established f luoroplate-based assays for the 
quantitation of gene expression after transf ections . In a 

30 number of embodiments, a nucleic acid encoding a mutant GFP or 
BFP of this invention is inserted into a vector and introduced 
into and expressed in a cell. Typically, expression of GFP 
mutants can be detected as quickly as 5 hours post -infection 
or less. Expression is followed over time in living cells by 

35 a simple measurement in multi-well plates. In this way, many 
transf ections can be processed in parallel. 



WO 97/42320 



PCT/US97/07625 



55 

Example 9 

The vectors and nucleic acids provided herein are 
used to generate chimeric proteins wherein a nucleic acid 
sequence that encodes a selected gene product is fused to the 
C- or N- terminus of the mutant GFPs and/or BFPs of this 
invention. A number of unique viral, plasmid and hybrid gene 
constructs have been generated that incorporate the new mutant 
GFP and/or mutant BFP sequences indicated above. These 
include : 

• HIV viral sequences (in the nef gene) containing SG11 or 
SG25 

• Neomycin & hygromycin plasmids containing SG11 or SG25 

• Moloney Leukemia Virus vector (retrovirus) also 
expressing SG25 

• Hybrid gene constructs expressing HIV viral proteins 
(rev, td-rev, tat, nef, gag, env, and vpr) and either 
SG11 or SG25 or SB50. 

• Hybrid gene construct containing vectors that incorporate 
the cytoplasmic proteins ran, B23, nucleolin, poly-A 
binding protein and either SG11 or SG25 or SB50 . 

These hybrids of the mutant nucleic acids provided 
herein are used to study protein trafficking in living 
mammalian cells. Like the wild type GFP, the mutant GFP 
proteins are normally distributed throughout the cell except 
for the nucleolus. Fusions to other proteins redistribute the 
fluorescence, depending on the partner in the hybrid. For 
example, fusion with the entire HIV-i Rev protein results in a 
hybrid molecule which retains the Rev function and is 
localized in the nucleolus where Rev is preferentially found. 
Fusion to the N- terminal domain of the HIV-1 Nef protein 
created a chimeric protein detected in the plasma membrane, 
the site of Nef localization. 

Example 10: pCMVofoll 

pCMVgfoll is a pFREDll derivative containing the 
bacterial neomycin phosphotransferase gene (neo) (Southern and 
Berg (1982) J. Mol . Appl . Genetics 1:327) fused at the 
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C- terminus of SG11. A four amino acid (Gly-Ala-Gly-Ala) (SEQ 
ID NO: 26) linker region connects the last amino acid of SG11 
to the second amino acid of neo, thus generating the hybrid 
SGll-neo protein (gfoll, SEQ ID NO: 25) . Gfoll is expressed 
5 from the CMV promoter and contains the intact SG11 polypeptide 
and all of neo except for the first Met. 

pCMVgfoll was constructed in several steps. First, 
pFREDHDNae was constructed by Nael digestion of pFREDll and 
self -ligation of the 4613bp fragment. The Nael deletion 
10 removes the SV4 0 promoter and neo gene from pFREDll,thus 

creating pFREDHDNae. Next, in order to fuse the neo coding 
region downstream to SG11, the neo gene was PCR amplified from 
pcDNA3 using primers Bio51 

{ 5 1 - CGCGGATCCTTCGAACAAGATGGATTGCACGC - 3 ' ) ( SEQ ID NO : 2 7 ) and 
15 Bio52 ( 5 - CCGGAATTCTCAGAAGAACTCGTCAAGAAGGCGA - 3 * ) ( SEQ ID 

NO: 28) . Primer BioSl introduces a BamHI site followed by a 
BstBI recognition sequence at the 5 f end of neo, while primer 
Bio52 introduces an EcoRI site 3' to the neo gene. The PCR 
product was digested with BamHI and EcoRI and cloned into the 
20 4582 bp vector resulting from the BamHI -EcoRI digestion of 

pFREDHDNae , thus generating pFREDHDNaeBstNeo . Subsequently, 
SG11 was PCR amplified from pFREDHDNae using primers Bio4 9 
( 5 ' - GGCG CG CAAGAAATGGCTAG CAAAGG AGAAGAACTCTTCACTGGAG - 3 1 ) ( SEQ I D 
NO: 29) and Bio50 

25 ( 5 ' - CCCATCGATAGCACCAGCACCGTTGTACAGTTCATCCATGCCATGT - 3 ' ) (SEQ ID 

NO: 30) to remove the sgll stop codon in pFREDHDNaeBstNeo and 
to introduce the four amino acid (Gly-Ala-Gly-Ala) linker 
followed by a Clal site. The PCR product was digested with 
Nhel and Clal and cloned into the 4763 bp NhelBstBi fragment 

30 from pFREDHDNaeBstNeo, thus generating pCMVgfoll. 

Following transfection of 293 cells (Graham et al . 
(1977), J. Gen. Virol. 5:59) as well as other human and mouse 
cell lines with pCMVgfoll, bright fluorescent transf ectants 
were apparent under the flourescent microscope and colonies 

35 resistant to G418 could be obtained two weeks later. 

It should be noted that pCMVgfoll was the best 
protein fusion in terms of fluorescent emission intensity and 
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number of G418 resistant colonies compared to several SGll-neo 
or neo-SGll fusions generated and examined. 

Example 11; pPGKafo25 

pPGKgfo25 is a pCMVgfoII derivative containing SG25 
instead of SG11 within gfo (SEQ ID NO: 31) . Expression of 
gfo25 in pPGKgfo25 is under the control of the mouse 
phosphoglycerate kinase- 1 (PGK) promoter. 

pPGKgfo25 was constructed in several steps. First, 
a SacII site was introduced downstream of the PGK promoter in 
pPGKneobpA (Soriano et al . (1991) Cell: 64-393) by: 

i) annealing oligonucleotides #18990 (SEQ ID NO: 32) 
(5 1 -GACCGGGACACGTATCCAGCCTCCGC-3 1 ) and 18991 (SEQ ID 

NO : 3 3 ) ( 5 ' - GGAGG CTGGATACGTGTCCCGGTCTGCA - 3 ' ) to create a 
double stranded adapter for PstI at the 5* end and SacII 
at the 3 1 end. 

ii) ligating this adapter to the 3423bp fragment from the 
PstI -Sac I I double digestion of pPGKneobpA, thus 
generating pPGKPtAf Sc. 

Next, the CMV promoter of pFRED25 was replaced with the PGK 
promoter by cloning the 565bp Sail (filled with Klenow) -SacII 
region from pPGKPtAfSc to the 5288bp Bglll (filled with 
Klenow) -SacII fragment from pFRED2 5 , resulting in pFRED25PGK. 
In the final step, pPGKgfo25 was constructed by ligating the 
813bp Bglll-Ndel fragment from pFRED25PGK containing the PGK 
promoter and SG25, to the 4185bp Bglll-Ndel fragment of 
pCMVgfoll. 

Example 12: pGen-PGKcrf o25RO (SEP ID NO: 34) 

pGen-PGKgfo25RO is a pGen- (Soriano et al. (1991), J. 
Virol, 65:2314) derivative containing the gfo25 hybrid under 
the control of PGK promoter. It was constructed by subcloning 
the 2810bp Sail fragment of pPGKgfo25 into the Xhol site of 
pGen. In viruses generated from pGen-PGKgf o25R0 (see below) 
transcription originated from the PGK promoter is in reverse 
orientation (RO) to that initiated from the viral long 
terminal repeats (LTR) . 
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To generate ecotropic or pseudotyped viruses, 
pGen-PGKgf O25R0 was co- transf ected into 293 cells together 
with pHIT60 and pHIT123 DNAs (production of ecotropic virus) 
or with pHIT60 and pHCMV-G DNAs (production of pseudotyped 
5 virus) . pHIT60 and pHIT123 contain the gag-pol and env coding 
regions from the Moloney murine leukemia virus (Mo-MLV) 
respectively, under the control of the CMV promoter (Soneoka 
et ai . (1995), Nuc. Acid Res. 23:628. pHCMV-G contains the 
coding region of the G protein from the vesicular stomatitis 
10 virus (VSV) expressed from the CMV promoter (Yee et al . 

(1994), Proc. ATat'l Acad. Sci . USA 91:9564. Virus-containing 
supernatants were harvested 48 hours post transf ection, 
filtered and stored at -80°C. 

15 Example 13: PNLnSGll (SEP ID NO:35) 

The SG11 sequence from plasmid pFREDll was PCR- 
amplified with primers #17982 (SEQ ID NO: 36) 

(5 1 -GGGGCGTACGGAGCGCTCCGAATTCGGTACCGTTTAAACGGGCCCTCTCGAGTCC 
GTTGTACAGTTCATCCATG-3 ' ) and #17983 (SEQ ID NO: 37) 

20 ( 5 1 - GGGGGAATTCGCGCGCGTACGTAAGCGCTAGCTGAGCAAGAAATGGCTAGCAAA 

GGAGAAGAACTC-3 • ) . The PCR product was digested with BlpI and 
Xhol and cloned into the large Blpl-Xhol fragment from pNL4-3 
(Adachi et al . (1986), J. Virol. 59: 284. In pNLnSGll the 
full SG11 polypeptide containing an additional four 

25 linker-encoded amino acids at the C- terminus, is expressed as 
a hybrid protein with the 24 N-terminal amino acids of the 
HIV-1 protein Nef . 

We constructed transmissible HIV-1 stocks with our 
mutants, which generate green fluorescence upon transf ection 

30 of human cells. These transmissible HIV-1 stocks are used to 
detect the kinetics of infection under a variety of 
conditions. In particular, they are used to study the effects 
of drugs on the kinetics of infection. The level of 
fluorescence, and the subcellular compartmentalization of that 

35 fluorescence, is easily visualized and quantified using well 
known methods. This system is easy to visualize, and 
dramatically cuts the costs of many experiments that are 
presently tedious and expensive. 
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To produce infectious virus, pNLnSGll was 
transfected in 293 cells. 24 hours later, Jurkat cells were 
added to the transf ectants . At various times post -infection, 
the medium was removed, filtered, and used to infect fresh 
Jurkat or other HIV- 1 -permissive cells. Two days later the 
infected cells were green under fluorescent microscope. 
Visible syncytia were also green. Viral stocks were generated 
and kept at -80° C. 

When the nucleic acids, vectors, mutant proteins 
provided herein are combined with the knowledge of those 
skilled in the art of genetic engineering and the guidance 
provided herein, it will be apparent to one of ordinary skill 
in the art that many changes and modifications can be made 
thereto without departing from the spirit or scope of the 
invention as set forth herein. These changes and 
modifications are encompassed by the present invention. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

5 

(i) APPLICANT: Pavlakis , George N. 

Gaitanaris, George A. 
Stauber, Roland H. 
Vournakis, John N. 

10 

(ii> TITLE OF INVENTION: Mutant Aequorea victoria Fluorescent 
Proteins Having Increased Cellular Fluorescence 

(iii) NUMBER OF SEQUENCES: 37 

15 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Townsend and Townsend and Crew LLP 

(B) STREET: Two Embarcadero Center, 8th Floor 

(C) CITY: San Francisco 
20 (D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 94111-3834 

(v) COMPUTER READABLE FORM: 
25 (A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC - DOS /MS - DOS 

(D) SOFTWARE : Patentln Release #1.0, Version 81.30 

30 (vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US Not yet assigned 

(B) FILING DATE: Not yet assigned 

(C) CLASSIFICATION: 

3 5 (viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Weber. Kenneth A. 

(B) REGISTRATION NUMBER: 31,677 

(C) REFERENCE/DOCKET NUMBER: 015280-24 9000 

40 (ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (415) 576-0200 

(B) TELEFAX: (415) 576-0300 

45 (2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 

(B) TYPE: nucleic acid 
50 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

55 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 1..720 

(D) OTHER INFORMATION: /product = "wild type Aequorea victoria 
60 Green Fluorescent Protein (wtGF) " 
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(Xi) SEQUENCE DESCRIPTION : SEQ ID NO:l: 

ATG GCT AGC AAA GGA GAA GAA CTC TTC ACT GGA GTT GTC CCA ATT CTT 4 8 

Met Ala Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro He Leu 
1 5 io 15 

GTT GAA TTA GAT GGT GAT GTT AAT GGG CAC AAA TTT TCT GTC AGT GGA 96 
Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly 
20 25 30 

GAG GGT GAA GGT GAT GCA ACA TAC GGA AAA CTT ACC CTT AAA TTT ATT 144 
Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe He 
35 40 45 

15 TGC ACT ACT GGA AAA CTA CCT GTT CCA TGG CCA ACA CTT GTC ACT ACT 192 

Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr 
50 55 60 



10 



20 



TTC TCT TAT GGT GTT CAA TGC TTT TCA AGA TAC CCG GAT CAT ATG AAA 24 0 

Phe Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys 
65 70 75 " 80 

CGG CAT GAC TTT TTC AAG AGT GCC ATG CCC GAA GGT TAT GTA CAG GAA 288 
Arg His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu 
~ b 85 90 95 

AGA ACT ATA TTT TTC AAA GAT GAC GGG AAC TAC AAG ACA CGT GCT GAA 336 
Arg Thr He Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu 
3Q 100 105 HO 

GTC AAG TTT GAA GGT GAT ACC CTT GTT AAT AGA ATC GAG TTA AAA GGT 384 
Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg He Glu Leu Lys Gly 
115 120 "' 125 

35 ATT GAT TTT AAA GAA GAT GGA AAC ATT CTT GGA CAC AAA TTG GAA TAC 432 

He Asp Phe Lys Glu Asp Gly Asn He Leu Gly His Lys Leu Glu Tyr 
130 135 140 



40 



AAC TAT AAC TCA CAC AAT GTA TAC ATC ATG GCA GAC AAA CAA AAG AAT 480 
Asn Tyr Asn Ser His Asn Val Tyr He Met Ala Asp Lys Gin Lys Asn 
145 150 155 160 



GGA ATC AAA GTT AAC TTC AAA ATT AGA CAC AAC ATT GAA GAT GGA AGC 528 
Gly He Lys Val Asn Phe Lys He Arg His Asn lie Glu Asp Gly Ser 
4b 170 175 

GTT CAA CTA GCA GAC CAT TAT CAA CAA AAT ACT CCA ATT GGC GAT GGC 576 
Val Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro He Gly Asp Glv 
5Q "0 185 190 

CCT GTC CTT TTA CCA GAC AAC CAT TAC CTG TCC ACA CAA TCT GCC CTT 624 
Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu 
195 200 205 

55 TCG AAA GAT CCC AAC GAA AAG AGA GAC CAC ATG GTC CTT CTT GAG TTT 672 

Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe 
210 215 220 

c , GTA ACA GCT GCT GGG ATT ACA CAT GGC ATG GAT GAA CTA TAC AAA TAA 720 

6 0 Val Thr Ala Ala Gly He Thr His Gly Met Asp Glu Leu Tyr Lvs * 
225 230 2 35 7 y 240 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 9 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Ala Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro lie Leu 
15 10 15 

Val Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly 
20 25 30 

Glu Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe He 
35 40 45 

Cys Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr 
50 55 60 

Phe Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys 
65 70 75 80 

Arg His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu 
65 90 95 

Arg Thr He Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu 
100 105 110 

Val Lys Phe Glu Gly Asp Thr Leu Val Asn Arg He Glu Leu Lys Gly 
115 120 125 

He Asp Phe Lys Glu Asp Gly Asn He Leu Gly His Lys Leu Glu Tyr 
130 135 140 

Asn Tyr Asn Ser His Asn Val Tyr He Met Ala Asp Lys Gin Lys Asn 
145 150 155 160 

Gly lie Lys Val Asn Phe Lys He Arg His Asn He Glu Asp Gly Ser 
165 170 175 

Val Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro He Gly Asp Gly 
180 185 190 

Pro Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu 
195 200 205 

Ser Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe 
210 215 220 

Val Thr Ala Ala Gly He Thr His Gly Met Asp Glu Leu Tyr Lys 
225 230 235 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 
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(A) NAME /KEY : - 

(B) LOCATION: 1..3S 

<D) OTHER INFORMATION: /note= "oligonucleotide sense primer 

#16417" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
GGAGGCGCGC AAGAAATGGC TAGCAAAGGA GAAGA 



35 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 36 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(XX) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. .36 

(D) OTHER INFORMATION: /note= "oligonucleotide antisense primer 

#16418" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4; 
GCGGGATCCT TATTTGTATA GTTCATCCAT GCCATG 



36 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 623B base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. .623 8 

(D) OTHER INFORMATION: /note= "pFRED7* 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 








GACGGATCGG 


GAGATCTCCC 


GATCCCCTAT GGTCGACTCT 


CAGTACAATC 


TGCTCTGATG 


60 


CCGCATAGTT 


AAGCCAGTAT 


CTGCTCCCTG CTTGTGTGTT 


GGAGGTCGCT 


GAGTAGTGCG 


120 


CGAGCAAAAT 


TTAAGCTACA 


ACAAGG CAAG GCTTGACCGA 


CAATTGCATG 


AAGAATCTGC 


180 


TTAGGGTTAG 


GCGTTTTGCG 


CTGCTTCGCC TCGAGGCCTG 


GCCATTGCAT 


ACGTTGTATC 


240 


CATATCATAA 


TATGTACATT 


TATATTGGCT CATGTCCAAC 


ATTACCGCCA 


TGTTGACATT 


300 


GATTATTGAC 


TAGTTATTAA 


TAGTAATCAA TTACGGGGTC 


ATTAGTTCAT 


AGCCCATATA 


360 


TGGAGTTCCG 


CGTTACATAA 


CTTACGGTAA ATGGCCCGCC 


TGGCTGACCG 


CCCAACGACC 


420 


CCCGCCCATT 


GACGTCAATA 


ATGACGTATG TTCCCATAGT 


AACGCCAATA 


GGGACTTTCC 


480 
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ATTGACGTCA ATGGGTGGAG TATTTACGGT 
ATCATATGCC AAGTACGCCC CCTATTGACG 
5 ATGCCCAGTA CATGACCTTA TGGGACTTTC 

TCGCTATTAC CATGGTGATG CGGTTTTGGC 
• ACTCACGGGG ATTTCCAAGT CTCCACCCCA 

10 

AAAATCAACG GGACTTTCCA AAATGTCGTA 
GTAGGCGTGT ACGGTGGGAG GTCTATATAA 
15 CCTGGAGACG CCATCCACGC TGTTTTGACC 

TCCGCGGGCG CGCAAGAAAT GGCTAGCAAA 
ATTCTTGTTG AATTAGATGG TGATGTTAAT 

20 

GAAGGTGATG CAACATACGG AAAACTTACC 
CCTGTTCCAT GGCCAACACT TGTCACTACT 
25 TACCCGGATC ATATGAAACG GCATGACTTT 

CAGGAAAGAA CTATATTTTT CAAAGATGAC 
TTTGAAGGTG ATACCCTTGT TAATAGAATC 

30 

GGAAACATTC TTGGACACAA ATTGGAATAC 
GCAGACAAAC AAAAGAATGG AATCAAAGTT 

3 5 GGAAGCGTTC AACTAGCAGA CCATTATCAA 

CTTTTACCAG ACAACCATTA CCTGTCCACA 
AAGAGAGACC ACATGGTCCT TCTTGAGTTT 

40 

GATGAACTAT ACAAATAAGG ATCCACTAGT 
ATATCCATCA CACTGGCGGC CGCTCGAGCA 

4 5 ACCTAAATGC TAGAGCTCGC TGATCAGCCT 

TTGTTTGCCC CTCCCCCGTG CCTTCCTTGA 
CCTAATAAAA TGAGGAAATT GCATCGCATT 

50 

GTGGGGTGGG GCAGGACAGC AAGGGGGAGG 
ATGCGGTGGG CTCTATGGCT TCTGAGGCGG 
55 CCCACGCGCC CTGTAGCGGC GCATTAAGCG 

CCGCTACACT TGCCAGCGCC CTAGCGCCCG 
CCACGTTCGC CGGCTTTCCC CGTCAAGCTC 

60 

TTAGTGCTTT ACGGCACCTC GACCCCAAAA 
GGCCATCGCC CTGATAGACG GTTTTTCGCC 
6 5 GTGGACTCTT GTTCCAAACT GGAACAACAC 

TATAAGGGAT TTTGGGGATT TCGGCCTATT 
TTAACGCGAA TTAATTCTGT GGAATGTGTG 
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AAACTGCCCA 


CTTGGCAGTA 


CATCAAGTGT 


540 


TCAATGACGG 


TAAATGGCCC 


GCCTGGCATT 


600 


CTACTTGGCA 


GTACATCTAC 


GTATTAGTCA 


660 


AGTACATCAA 


TGGGCGTGGA 


TAGCGGTTTG 


720 


TTGACGTCAA 


TGGGAGTTTG 


TTTTGGCACC 


780 


ACAACTCCGC 


CCCATTGACG 


CAAATGGGCG 


840 


GCAGAGCTCG 


TTTAGTGAAC 


CGTCAGATCG 


900 


TCCATAGAAG 


ACACCGGGAC 


CGATCCAGCC 


960 


GGAGAAGAAC 


TCTTCACTGG 


AGTTGTCCCA 


1020 


GGGCACAAAT 


TTTCTGTCAG 


TGGAGAGGGT 


1080 


CTTAAATTTA 


TTTGCACTAC 


TGGAAAACTA 


1140 


TTCTCTTATG 


GTGTTCAATG 


CTTTTCAAGA 


1200 


TTCAAGAGTG 


CCATGCCCGA 


AGGTTATGTA 


1260 


GGGAACTACA 


AGACACGTGC 


TGAAGTCAAG 


1320 


GAGTTAAAAG 


GTATTGATTT 


TAAAGAAGAT 


1380 


AACTATAACT 


CACACAATGT 


ATACATCATG 


1440 


AACTTCAAAA 


TTAGACACAA 


CATTGAAGAT 


1500 


CAAAATACTC 


CAATTGGCGA 


TGGCCCTGTC 


1560 


CAATCTGCCC 


TTTCGAAAGA 


TCCCAACGAA 


1620 


GTAACAGCTG 


CTGGGATTAC 


ACATGGCATG 


1680 


AACGGCCGCC 


AGTGTGCTGG 


AATTCTGCAG 


1740 


TGCATCTAGA 


GGGCCCTATT 


CTATAGTGTC 


1800 


CGACTGTGCC 


TTCTAGTTCC 


CAGCCATCTG 


1860 


CCCTGGAAGG 


TGCCACTCCC 


ACTGTCCTTT 


1920 


GTCTGAGTAG 


GTGTCATTCT 


ATTCTGGGGG 


1980 


ATTGGGAAGA 


CAATAGCAGG 


CATGCTGGGG 


2040 


AAAGAACCAG 


CTGGGGCTCT 


AGGGGGTATC 


2100 


CGGCGGGTGT 


GGTGGTTACG 


CGCAGCGTGA 


2160 


CTCCTTTCGC 


TTTCTTCCCT 


TCCTTTCTCG 


2220 


TAAATCGGGG 


CATCCCTTTA 


GGGTTCCGAT 


2280 


AACTTGATTA 


GGGTGATGGT 


TCACGTAGTG 


2340 


CTTTGACGTT 


GGAGTCCACG 


TTCTTTAATA 


2400 


TCAACCCTAT 


CTCGGTCTAT 


TCTTTTGATT 


2460 


GGTTAAAAAA 


TGAGCTGATT 


TAACAAAAAT 


2520 


TCAGTTAGGG 


TGTGGAAAGT 


CCCCAGGCTC 


2580 
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CCCAGGCAGG CAGAAGTATG CAAAGCATGC 
AGTCCCCAGG CTCCCCAGCA GGCAGAAGTA 
5 CCATAGTCCC GCCCCTAACT CCGCCCATCC 

CTCCGCCCCA TGGCTGACTA ATTTTTTTTA 
CTGAGCTATT CCAGAAGTAG TGAGGAGGCT 

10 

TCCCGGGAGC TTGTATATCC ATTTTCGGAT 
GCATGATTGA ACAAGATGGA TTGCACGCAG 
15 TCGGCTATGA CTGGGCACAA CAGACAATCG 

CAGCGCAGGG GCGCCCGGTT CTTTTTGTCA 
TGCAGGACGA GGCAGCGCGG CTATCGTGGC 

20 

TGCTCGACGT TGTCACTGAA GCGGGAAGGG 
AGGATCTCCT GTCATCTCAC CTTGCTCCTG 
25 TGCGGCGGCT GCATACGCTT GATCCGGCTA 

GCATCGAGCG AGCACGTACT CGGATGGAAG 
AAGAGCATCA GGGGCTCGCG CCAGCCGAAC 

30 

ACGGCGAGGA TCTCGTCGTG ACCCATGGCG 
ATGGCCGCTT TTCTGGATTC ATCGACTGTG 
35 ACATAGCGTT GGCTACCCGT GATATTGCTG 

TCCTCG7GCT TTACGGTATC GCCGCTCCCG 
TTGACGAGTT CTTCTGAGCG GGACTCTGGG 

40 

CCTGCCATCA CGAGATTTCG ATTCCACCGC 
CGTTTTCCGG GACGCCGGCT GGATGATCCT 
45 CGCCCACCCC AACTTGTTTA TTGCAGCTTA 

AAATTTCACA AATAAAG CAT TTTTTTCACT 
CAATGTATCT TATCATGTCT GTATACCGTC 

50 

GTCATAGCTG TTTCCTGTGT GAAATTGTTA 
CGGAAGCATA AAGTGTAAAG CCTGGGGTGC 
55 GTTGCGCTCA CTGCCCGCTT TCCAGTCGGG 

CGGCCAACGC GCGGGGAGAG GCGGTTTGCG 
TGACTCGCTG CGCTCGGTCG TTCGGCTGCG 

60 

AATACGGTTA TCCACAGAAT CAGGGGATAA 
GCAAAAGGCC AGGAACCGTA AAAAGGCCGC 
65 CCCTGACGAG CATCACAAAA ATCGACGCTC 

ATAAAGATAC CAGGCGTTTC CCCCTGGAAG 
GCCGCTTACC GGATACCTGT CCGCCTTTCT 



65 

ATCTCAATTA GTCAGCAACC AGGTGTGGAA 264 0 

TGCAAAGCAT GCATCTCAAT TAGTCAGCAA 2700 

CGCCCCTAAC TCCGCCCAGT TCCGCCCATT 2760 

TTTATGCAGA GGCCGAGGCC GCCTCTGCCT 2820 

TTTTTGGAGG CCTAGGCTTT TGCAAAAAGC 2880 

CTGATCAAGA GACAGGATGA GGATCGTTTC 2940 

GTTCTCCGGC CGCTTGGGTG GAGAGGCTAT 3000 

GCTGCTCTGA TGCCGCCGTG TTCCGGCTGT 3 060 

AGACCGACCT GTCCGGTGCC CTGAATGAAC 3120 

TGGCCACGAC GGGCGTTCCT TGCGCAGCTG 3180 

ACTGGCTGCT ATTGGGCGAA GTGCCGGGGC 324 0 

CCGAGAAAGT ATCCATCATG GCTGATGCAA 3 300 

CCTGCCCATT CGACCACCAA GCGAAACATC 3360 

CCGGTCTTGT CGATCAGGAT GATCTGGACG 3420 

TGTTCGCCAG GCTCAAGGCG CGCATGCCCG 3480 

ATGCCTGCTT GCCGAATATC ATGGTGGAAA 354 0 

GCCGGCTGGG TGTGGCGGAC CGCTATCAGG 3600 

AAGAGCTTGG CGGCGAATGG GCTGACCGCT 3660 

ATTCGCAGCG CATCGCCTTC TATCGCCTTC 3720 

GTTCGAAATG ACCGACCAAG CGACGCCCAA 3780 

CGCCTTCTAT GAAAGGTTGG GCTTCGGAAT 3840 

CCAGCGCGGG GATCTCATGC TGGAGTTCTT 3900 

TAATGGTTAC AAATAAAG C A ATAGCATCAC 3 960 

GCATTCTAGT TGTGGTTTGT CCAAACTCAT 402 0 

GACCTCTAGC TAGAGCTTGG CGTAATCATG 4080 

TCCGCTCACA ATTCCACACA ACATACGAGC 4140 

CTAATGAGTG AGCTAACTCA CATTAATTGC 4200 

AAACCTGTCG TGCCAGCTGC ATTAATGAAT 4 260 

TATTGGGCGC TCTTCCGCTT CCTCGCTCAC 4320 

GCGAGCGGTA TCAGCTCACT CAAAGGCGGT 4 380 

CGCAGGAAAG AACATGTGAG CAAAAGGCCA 4440 

GTTGCTGGCG TTTTTCCATA GGCTCCGCCC 4500 

AAGTCAGAGG TGGCGAAACC CGACAGGACT 4 560 

CTCCCTCGTG CGCTCTCCTG TTCCGACCCT 4620 

CCCTTCGGGA AGCGTGGCGC TTTCTCAATG 4680 
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CTCACGCTGT 


AGGTATCTCA 


GTTCGGTGTA 


GGTCGTTCGC 


TCCAAGCTGG 


GCTGTGTGCA 


4740 




CGAACCCCCC 


GTTCAGCCCG 


ACCGCTGCGC 


CTTATCCGGT 


AACTATCGTC 


TTGAGTCCAA 


4800 


5 


CCCGGTAAGA 


CACGACTTAT 


CGCCACTGGC 


AGCAGCCACT 


GGTAACAGGA 


TTAGCAGAGC 


4860 




GAGGTATGTA 


GGCGGTGCTA 


CAGAGTTCTT GAAGTGGTGG 


CCTAACTACG 


GCTACACTAG 


4920 


10 


AAGGACAGTA 


TTTGGTATCT 


GCGCTCTGCT 


GAAGCCAGTT 


ACCTTCGGAA 


AAAGAGTTGG 


4980 


TAGCTCTTGA 


TCCGGCAAAC 


AAACCACCGC 


TGGTAGCGGT 


GGTTTTTTTG 


TTTGCAAGCA 


5040 




GCAGATTACG 


CGCAGAAAAA 


AAGGATCTCA 


AGAAGATCCT 


TTGATCTTTT 


CTACGGGGTC 


5100 


15 


TGACGCTCAG 


TGGAACGAAA ACTCACGTTA AGGGATTTTG GTCATGAGAT 


TATCAAAAAG 


5160 




GATCTTCACC 


TAGATCCTTT 


TAAATTAAAA 


ATGAAGTTTT 


AAATCAATCT 


AAAGTATATA 


5220 


20 


TGAGTAAACT 


TGGTCTGACA 


GTTACCAATG 


CTTAATCAGT 


GAGGCACCTA 


TCTCAGCGAT 


5280 


CTGTCTATTT 


CGTTCATCCA 


TAGTTGCCTG 


ACTCCCCGTC 


GTGTAGATAA 


CTACGATACG 


5340 




GGAGGGCTTA 


CCATCTGGCC 


CCAGTGCTGC 


AATGATACCG 


CGAGACCCAC 


GCTCACCGGC 


5400 


25 


TCCAGATTTA TCAGCAATAA 


ACCAGCCAGC 


CGGAAGGGCC 


GAGCGCAGAA 


GTGGTCCTGC 


5460 




AACTTTATCC 


GCCTCCATCC 


AGTCTATTAA 


TTGTTGCCGG 


GAAGCTAGAG 


TAAGTAGTTC 


5520 


30 


GCCAGTTAAT 


AGTTTGCGCA 


ACGTTGTTGC 


CATTGCTACA 


GGCATCGTGG 


TGTCACGCTC 


5580 


GTCGTTTGGT 


ATGGCTTCAT 


TCAGCTCCGG 


TTCCCAACGA 


TCAAGGCGAG 


TTACATGATC 


5640 




CCCCATGTTG 


TGCAAAAAAG 


CGGTTAGCTC 


CTTCGGTCCT 


CCGATCGTTG 


TCAGAAGTAA 


5700 


35 


GTTGGCCGCA GTGTTATCAC TCATGGTTAT GGCAGCACTG CATAATTCTC TTACTGTCAT 


5760 




GCCATCCGTA 


AGATGCTTTT 


CTGTGACTGG 


TGAGTACTCA 


ACCAAGTCAT 


TCTGAGAATA 


5820 


a n 


GTGTATGCGG 


CGACCGAGTT 


GCTCTTGCCC 


GGCGTCAATA 


CGGGATAATA 


CCGCGCCACA 


5B80 


TAGCAGAACT 


TTAAAAGTGC 


TCATCATTGG 


AAAACGTTCT 


TCGGGGCGAA 


AACTCTCAAG 


5940 




GATCTTACCG 


CTGTTGAGAT 


CCAGTTCGAT 


GTAACCCACT 


CGTGCACCCA 


ACTGATCTTC 


6000 


45 


AGCATCTTTT 


ACTTTCACCA 


GCGTTTCTGG 


GTGAGCAAAA 


ACAGGAAGGC 


AAAATGCCGC 


6060 




AAAAAAGGGA 


ATAAGGGCGA 


CACGGAAATG 


TTGAATACTC 


ATACTCTTCC 


TTTTTCAATA 


6120 


50 


TTATTGAAGC 


ATTTATCAGG 


GTTATTGTCT 


CATGAGCGGA 


TACATATTTG 


AATGTATTTA 


6180 


GAAAAATAAA 


CAAATAGGGG 


TTCCGCGCAC 


ATTTCCCCGA 


AAAGTGCCAC 


CTGACGTC 


6238 



(2) INFORMATION FOR SEQ ID NO: 6: 

55 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 699 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
60 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

65 (ix> FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. .3699 

(D) OTHER INFORMATION: /note= "pBSGFP" 
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10 



15 



20 



25 



30 



35 



.40 



45 



50 



55 



60 



65 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
GGAAATTGTA AACGTTAATA TTTTGTTAAA ATTCGCGTTA 
ATTTTTTAAC CAATAGGCCG AAATCGGCAA AATCCCTTAT 
GATAGGGTTG AGTGTTGTTC CAGTTTGGAA CAAGAGTCCA 
CAACGTCAAA GGGCGAAAAA CCGTCTATCA GGGCGATGGC 
CTAATCAAGT TTTTTGGGGT CGAGGTGCCG TAAAGCACTA 
CCCCCGATTT AGAGCTTGAC GGGG AAAGCC GGCGAACGTG 
AGCGAAAGGA GCGGGCGCTA GGGCGCTGGC AAGTGTAGCG 
CACACCCGCC GCGCTTAATG CGCCGCTACA GGGCGCGTCG 
GCGCAACTGT TGGGAAGGGC GATCGGTGCG GGCCTCTTCG 
AGGGGGATGT GCTGCAAGGC GATTAAGTTG GGTAACGCCA 
TTGTAAAACG ACGGCCAGTG AATTGTAATA CGACTCACTA 
GCCCCCCCTC GAGGTCGACG GTATCGATAA GCTTGATGAT 
CATGCCATGT GTAATCCCAG CAGCTGTTAC AAACTCAAGA 
TTCGTTGGGA TCTTTCGAAA GGGCAGATTG TGTGGACAGG 
GACAGGGCCA TCGCCAATTG GAGTATTTTG TTGATAATGG 
ATCTTCAA7G TTGTGTCTAA TTTTGAAGTT AACTTTGATT 
CATGATGTAT ACATTGTGTG AGTTATAGTT GTATTCCAAT 
ATCTTCTTTA AAATCAATAC CTTTTAACTC GATTCTATTA 
CTTGACTTCA GCACGTGTCT TGTAGTTCCC GTCATCTTTG 
TACATAACCT TCGGGCATGG CACTCTTGAA AAAGTCATGC 
TCTTGAAAAG CATTGAACAC CATAAGAGAA AGTAGTGACA 
TAGTTTTCCA GTAGTGCAAA TAAATTTAAG GGTAAGTTTT 
ACCCTCTCCA CTGACAGAAA ATTTGTGCCC ATTAACATCA 
TGGGACAACT CCAGTGAAGA GTTCTTCTCC TTTGCTAGCC 
CCTGCAGCCC GGGGGATCCA CTAGTTCTAG AGCGGCCGCC 
TTTGTTCCCT TTAGTGAGGG TTAATTCCGA GCTTGGCGTA 
CTGTGTGAAA TTGTTATCCG CTCACAATTC CACACAACAT 
GTAAAGCCTG GGGTGCCTAA TGAGTGAGCT AACTCACATT 
CCGCTTTCCA GTCGGGAAAC CTGTCGTGCC AGCTGCATTA 
GGAGAGGCGG TTTGCGTATT GGGCGCTCTT CCGCTTCCTC 
CGGTCGTTCG GCTGCGGCGA GCGGTATCAG CTCACTCAAA 
CAGAATCAGG GGATAACGCA GGAAAGAACA TGTGAGCAAA 
ACCGTAAAAA GGCCGCGTTG CTGGCGTTTT TCCATAGGCT 
ACAAAAATCG ACGCTCAAGT CAGAGGTGGC GAAACCCGAC 



AATTTTTGTT 
AAATCAAAAG 
CTATTAAAGA 
CCACTACGTG 
AATCGGAACC 
GCGAGAAAGG 
GTCACGCTGC 
CGCCATTCGC 
CTATTACGCC 
GGGTTTTCCC 
TAGGGCGAAT 
CCTTATTTGT 
AGGACCATGT 
TAATGGTTGT 
TCTGCTAGTT 
CCATTCTTTT 
TTGTGTCCAA 
ACAAGGGTAT 
AAAAATATAG 
CGTTTCATAT 
AGTGTTGGCC 
CCGTATGTTG 
CCATCTAATT 
ATTTCTTGCG 
ACCGCGGTGG 
ATCATGGTCA 
ACGAGCCGGA 
AATTGCGTTG 
ATGAATCGGC 
GCTCACTGAC 
GGCGGTAATA 
AGGCCAGCAA 
CCGCCCCCCT 
AGGACTATAA 



AAATCAGCTC 
AATAGACCGA 
ACGTGGACTC 
AACCATCACC 
CTAAAGGGAG 
AAGGGAAGAA 
GCGTAACCAC 
CATTCAGGCT 
AGCTGGCGAA 
AGTCACGACG 
TGGGTACCGG 
ATAGTTCATC 
GGTCTCTCTT 
CTGGTAAAAG 
GAACGCTTCC 
GTTTGTCTGC 
GAATGTTTCC 
CACCTTCAAA 
TTCTTTCCTG 
GATCCGGGTA 
ATGGAACAGG 
CATCACCTTC 
CAACAAGAAT 
CGATCGAATT 
AGCTCCAGCT 
TAGCTGTTTC 
AGCATAAAGT 
CGCTCACTGC 
CAACGCGCGG 
TCGCTGCGCT 
CGGTTATCCA 
AAGGCCAGGA 
GACGAGCATC 
AGATACCAGG 



60 
120 
1B0 
240 
300 
360 
420 
460 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
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ill LLULL 


TGGAAGCTCC 


CTCGTGCGCT 


CTCCTGTTCC 


GACCCTGCCG 


CTTACCGGAT 


2100 




At_v*TGTCCGC 


CTTTCTCCCT 


TCGGGAAGCG 


TGGCGCTTTC 


TCATAGCTCA 


CGCTGTAGGT 


2160 




ATCTCAGTTC 


GGTGTAGGTC 


GTTCGCTCCA 


AGCTGGGCTG 


TGTGCACGAA 


CCCCCCGTTC 


2220 




AGCCCGACCG 


CTGCGCCTTA 


TCCGGTAACT 


ATCGTCTTGA 


GTCCAACCCG 


GTAAGACACG 


2280 


10 


ACTTATCGCC 


ACTGGCAGCA 


GCCACTGGTA 


ACAGGATTAG 


CAGAGCGAGG 


TATGTAGGCG 


2340 


GTGCTACAGA 


GTTCTTGAAG 


TGGTGGCCTA ACTACGGCTA CACTAGAAGG ACAGTATTTG 


2400 




GTATCTGCGC 


TCTGCTGAAG 


CCAGTTACCT TCGGAAAAAG AGTTGGTAGC TCTTGATCCG 


2460 


15 


GCAAACAAAC 


CACCGCTGGT 


AGCGGTGGTT 


TTTTTGTTTG 


CAAGCAGCAG ATTACGCGCA 


2520 




GAAAAAAAGG ATCTCAAGAA GATCCTTTGA TCTTTTCTAC 


GGGGTCTGAC 


GCTCAGTGGA 


2580 


20 


ACGAAAACTC 


ACGTTAAGGG 


ATTTTGGTCA 


TGAGATTATC 


AAAAAGGATC 


TTC AC CTAG A 


2640 


TCCTTTTAAA TTAAAAATGA AGTTTTAAAT 


CAATCTAAAG 


TATATATGAG 


TAAACTTGGT 


2700 




CTGACAGTTA 


CCAATGCTTA 


ATCAGTGAGG 


CACCTATCTC 


AGCGATCTGT 


CTATTTCGTT 


2760 


25 


CATCCATAGT 


TGCCTGACTC 


CCCGTCGTGT 


AGATAACTAC 


GATACGGGAG 


GGCTTACCAT 


2820 




CTGGCCCCAG 


TGCTGCAATG 


ATACCGCGAG 


ACCCACGCTC 


ACCGGCTCCA 


GATTTATCAG 


2880 


30 


CAATAAACCA 


GCCAGCCGGA 


AGGGCCGAGC 


G C AG AAGTGG 


TCCTGCAACT 


TTATCCGCCT 


2940 


CCATCCAGTC 


TATTAATTGT 


TGCCGGGAAG 


CTAGAGTAAG 


TAGTTCGCCA 


GTTAATAGTT 


3000 




TGCGCAACGT 


TGTTGCCATT 


GCTACAGGCA 


TCGTGGTGTC 


ACGCTCGTCG 


TTTGGTATGG 


3060 


35 


CTTCATTCAG 


CTCCGGTTCC 


CAACGATCAA 


GGCGAGTTAC 


ATGATCCCCC 


ATGTTGTGCA 


3120 




AAAAAGCGGT 


TAGCTCCTTC 


GGTCCTCCGA 


TCGTTGTCAG 


AAGTAAGTTG 


GCCGCAGTGT 


3180 


40 


TATCACTCAT 


GGTTATGGCA 


GCACTGCATA 


ATTCTCTTAC 


TGTCATGCCA 


TCCGTAAGAT 


3240 


GCTTTTCTGT 


GACTGGTGAG 


TACTCAACCA AGTCATTCTG 


AGAATAGTGT 


ATGCGGCGAC 


3300 




CGAGTTGCTC 


TTGCCCGGCG 


TCAATACGGG 


ATAATACCGC 


GCCACATAGC 


AGAACTTTAA 


3360 


45 


AAGTG CTCAT 


CATTGGAAAA 


CGTTCTTCGG 


GGCGAAAACT 


CTCAAGGATC 


TTACCGCTGT 


3420 




TGAGATCCAG 


TTCGATGTAA 


CCCACTCGTG 


CACCCAACTG 


ATCTTCAGCA 


TCTTTTACTT 


3480 


50 


TCACCAGCGT 


TTCTGGGTGA 


GCAAAAACAG 


GAAGGCAAAA 


TGCCGCAAAA 


AAGGGAA7AA 


3540 


GGGCGACACG 


GAAATGTTGA 


ATACTCATAC 


TCTTCCTTTT 


TCAATATTAT 


TGAAGCATTT 


3600 




ATCAGGGTTA 


TTGTCTCATG 


AGCGGATACA 


TATTTGAATG 


TATTTAGAAA 


AATAAACAAA 


3660 


55 


TAGGGGTTCC 


GCGCACATTT 


CCCCGAAAAG 


TGCCACCTG 






3699 




(2) INFORMATION FOR SEQ ID NO : 7 : 











(i) SEQUENCE CHARACTERISTICS: 
60 (A) LENGTH: 6361 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

65 (ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 
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<B) LOCATION: 1..6361 

(D) OTHER INFORMATION: /note= "pFRED13 ' 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
TTCTCATGTT TGACAGCTTA TCATCGATAA GCTTTAATGC 
TTGCTAACGC AGTCAGGCAC CGTGTATGAA ATCTAACAAT 
CACCGTCACC CTGGATGCTG TAGGCATAGG CTTGGTTATG 
GCGGGATATC CGGATATAGT TCCTCCTTTC AGCAAAAAAC 
GCCCCAAGGG GTTATGCTAG TTATTGCTCA GCGGTGGCAG 
CGGGCTTTGT TAGCAGCCGG ATCCTTATTT GTATAGTTCA 
AGCAGCTGTT ACAAACTCAA GAAGGACCAT GTGGTCTCTC 
AAGGGCAGAT TGTGTGGACA GGTAATGGTT GTCTGGTAAA 
TGGAGTATTT TGTTGATAAT GGTCTGCTAG TTGAACGCTT 
AATTTTGAAG TTAACTTTGA TTCCATTCTT TTGTTTGTCT 
TGAGTTATAG TTGTATTCCA ATTTGTGTCC AAGAATGTTT 
ACCTTTTAAC TCGATTCTAT TAACAAGGGT ATCACCTTCA 
CTTGTAGTTC CCGTCATCTT TGAAAAATAT AGTTCTTTCC 
GGCACTCTTG AAAAAGTCAT GCCGTTTCAT ATGATCCGGG 
ACCATAAGAG AAAGTAGTGA CAAGTGTTGG CCATGGAACA 
AATAAATTTA AGGGTAAGTT TTCCGTATGT TGCATCACCT 
AAATTTGTGC CCATTAACAT CACCATCTAA TTCAACAAGA 
GAGTTCTTCT CCTTTGCTAG CCATATGTAT ATCTCCTTCT 
TCTAGAGGGG AATTGTTATC CGCTCACAAT TCCCCTATAG 
GGATCGAGAT CTCGATCCTC TACGCCGGAC GCATCGTGGC 
GTGCGGTTGC TGGCGCCTAT ATCGCCGACA TCACCGATGG 
TCGGGCTCAT GAGCGCTTGT TTCGGCGTGG GTATGGTGGC 
TGTTGGGCGC CATCTCCTTG CATGCACCAT TCCTTGCGGC 
ACCTACTACT GGGCTGCTTC CTAATGCAGG AGTCGCATAA 
GACACCATCG AATGGCGCAA AACCTTTCGC GGT ATG G CAT 
CAATTCAGGG TGGTGAATGT GAAACCAGTA ACGTTATACG 
GTCTCTTATC AGACCGTTTC CCGCGTGGTG AACCAGG CCA 
CGGGAAAAAG TGGAAGCGGC GATGGCGGAG CTGAATTACA 
CAACTGGCGG GCAAACAGTC GTTGCTGATT GGCGTTGCCA 
GCGCCGTCGC AAATTGTCGC GGCGATTAAA TCTCGCGCCG 
GTGGTGTCGA TGGTAGAACG AAGCGGCGTC GAAGCCTGTA 
CTCGCGCAAC GCGTCAGTGG GCTGATCATT AACTATCCGC 



GGTAGTTTAT 
GCGCTCATCG 
CCGGTACTGC 
CCCTCAAGAC 
CAGCCAACTC 
TC CATGC CAT 
TTTTCGTTGG 
AGGACAGGGC 
CCATCTTCAA 
GCCATGATGT 
CCATCTTCTT 
AACTTGACTT 
TGTACATAAC 
TATCTTGAAA 
GGTAGTTTTC 
TCACCCTCTC 
ATTGGGACAA 
TAAAGTTAAA 
TGAGTCGTAT 
CGGCATCACC 
GGAAGATCGG 
AGGCCCCGTG 
GGCGGTGCTC 
GGGAGAGCGT 
GAT AG CG C CC 
ATGTCGCAGA 
GCCACGTTTC 
TTCCCAACCG 
CCTCCAGTCT 
ATCAACTGGG 
AAGCGGCGGT 
TGGATGACCA 



CACAGTTAAA 
TCATCCTCGG 
CGGGCCTCTT 
CCGTTTAGAG 
AGCTTCCTTT 
GTGTAATCCC 
GATCTTTCGA 
CATCGCCAAT 
TGTTGTGTCT 
ATACATTGTG 
TAAAATCAAT 
CAGCACGTGT 
CTTCGGGCAT 
AGCATTGAAC 
CAGTAGTGCA 
CACTGACAGA 
CTCCAGTGAA 
CAAAATTATT 
TAATTTCGCG 
GGCGCCACAG 
GCTCGCCACT 
GCCGGGGGAC 
AACGGCCTCA 
CGAGATCCCG 
GGAAGAGAGT 
GTATGCCGGT 
TGCGAAAACG 
CGTGGCACAA 
GGCCCTGCAC 
TGCCAGCGTG 
GCACAATCTT 
GGATGCCATT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
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GCTGTGGAAG CTGCCTGCAC TAATGTTCCG 
CCCATCAACA GTATTATTTT CTCCCATGAA 
5 GTCGCATTGG GTCACCAGCA AATCGCGCTG 

CGTCTGCGTC TGGCTGGCTG GCATAAATAT 
GAACGGGAAG GCGACTGGAG TGCCATGTCC 

10 

GAGGGCATCG TTCCCACTGC GATGCTGGTT 
CGCGCCATTA CCGAGTCCGG GCTGCGCGTT 
15 GATACCGAAG ACAGCTCATG TTATATCCCG 

CTGCTGGGGC AAACCAGCGT GGACCGCTTG 
GGCAATCAGC TGTTGCCCGT CTCACTGGTG 

20 

CAAACCGCCT CTCCCCGCGC GTTGGCCGAT 
CGACTGGAAA GCGGGCAGTG AGCGCAACGC 
25 ACCGGGATCT CGACCGATGC CCTTGAGAGC 

GCGGGGCATG ACTATCGTCG CCGCACTTAT 
ACAGGTGCCG GCAGCGCTCT GGGTCATTTT 

30 

GATGATCGGC CTGTCGCTTG CGGTATTCGG 
CACTGGTCCC GCCACCAAAC GTTTCGGCGA 
35 CGACGCGCTG GGCTACGTCT TGCTGGCGTT 

TATGATTCTT CTCGCTTCCG GCGGCATCGG 
GCAGGTAGAT GACGACCATC AGGGACAGCT 

40 

AACTTCGATC ACTGGACCGC TGATCGTCAC 
GAACGGGTTG GCATGGATTG TAGGCGCCGC 
4 5 TCGCGGTGCA TGGAGCCGGG CCACCTCGAC 

GGATTC AC CA CTCCAAGAAT TGGAGCCAAT 
ACCAACCCTT GGCAGAACAT ATCCATCGCG 

50 

ATCTCGGGCA GCGTTGGGTC CTGGCCACGG 
ACCCGGCTAG GCTGGCGGGG TTGCCTTACT 
55 CGAACGTGAA GCGACTGCTG CTGCAAAACG 

TTCGGTTTCC GTGTTTCGTA AAGTCTGGAA 
TTCCGGATCT GCATCGCAGG ATGCTGCTGG 

60 

ACGAAGCGCT GGCATTGACC CTGAGTGATT 
AGTTGTTTAC CCTCACAACG TTCCAGTAAC 
65 GTGAGCATCC TCTCTCGTTT CATCGGTATC 

ACGGAGGCAT CAGTGACCAA ACAGGAAAAA 
AGCCAGACAT TAACGCTTCT G G AG AAACTC 



PCT/US97/07625 



70 



GCGTTATTTC 


TTGATGTCTC 


TGACCAGACA 


1980 


GACGGTACGC 


GACTGGGCGT 


GGAGCATCTG 


2040 


TTAGCGGGCC 


CATTAAGTTC 


TGTCTCGGCG 


2100 


CTCACTCGCA 


ATCAAATTCA 


GCCGATAGCG 


2160 


GGTTTTCAAC 


AAACCATGCA 


AATGCTGAAT 


2220 


GCCAACGATC 


AGATGGCGCT 


GGGCGCAATG 


2280 


GGTGCGGATA 


TCTCGGTAGT 


GGGATACGAC 


2340 


CCGTTAACCA 


CCATCAAACA 


GGATTTTCGC 


2400 


CTGCAACTCT 


CTCAGGGCCA 


GGCGGTGAAG 


2460 


AAAAGAAAAA 


CCACCCTGGC 


GCCCAATACG 


2520 


TCATTAATGC 


AGCTGGCACG 


ACAGGTTTCC 


2560 


AATTAATGTA AGTTAGCTCA 


CTCATTAGGC 


2640 


CTTCAACCCA 


GTCAGCTCCT 


TCCGGTGGGC 


2700 


GACTGTCTTC 


TTTATCATGC 


AACTCGTAGG 


2760 


CGGCGAGGAC 


CGCTTTCGCT 


GGAGCGCGAC 


2820 


AATCTTGCAC 


GCCCTCGCTC 


AAGCCTTCGT 


2880 


GAAGCAGGCC ATTATCGCCG GCATGGCGGC 


2940 


CG CGACGCG A 


GGCTGGATGG 


CCTTCCCCAT 


3000 


GATGCCCGCG 


TTGCAGGCCA 


TGCTGTCCAG 


3060 


TCAAGG AT CG 


CTCGCGGCTC 


TTACCAGCCT 


3120 


GGCGATTTAT 


GCCGCCTCGG 


CGAGCACATG 


3180 


CCTATACCTT 


GTCTGCCTCC 


CCGCGTTGCG 


3240 


CTGAATGGAA 


GCCGGCGGCA 


CCTCGCTAAC 


3300 


CAATTCTTGC 


GGAGAACTGT 


GAATGCGCAA 


3360 


TCCGCCATCT 


CCAGCAGCCG 


CACGCGGCGC 


3420 


GTGCGCATGA 


TCGTGCTCCT 


GTCGTTGAGG 


3480 


GGTTAGCAGA 


ATGAATCACC 


GATACGCGAG 


3 54 0 


TCTGCGACCT 


GAGCAACAAC 


ATGAATGGTC 


3600 


ACGCGGAAGT 


CAGCGCCCTG 


CACCATTATG 


3660 


CTACCCTGTG 


GAACACCTAC 


ATCTGTATTA 


3720 


TTTCTCTGGT 


CCCGCCGCAT 


CCATACCGCC 


3780 


CGGGCATGTT 


CATC ATC AG T 


AACCCGTATC 


3840 


ATTACCCCCA 


TGAACAGAAA 


TCCCCCTTAC 


3900 


ACCGCCCTTA ACATGGCCCG 


CTTTATCAGA 


3960 


AACGAGCTGG 


ACGCGGATGA 


ACAGGCAGAC 


4020 
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ATCTGTGAAT 
GGTGATGACG 
TAAGCGGATG 
CGGGGCGCAG 
CGGCATCAGA 
ATGCGTAAGG 
GCGCTCGGTC 
ATCCACAGAA 
CAGGAACCGT 
GCATCACAAA 
CCAGGCGTTT 
CGGATACCTG 
TAGGTATCTC 
CGTTCAGCCC 
ACACGACTTA 
AGGCGGTGCT 
ATTTGGTATC 
ATCCGGCAAA 
GCGCAGAAAA 
GTGGAACGAA 
CTAGATCCTT 
TTGGTCTGAC 
TCGTTCATCC 
ACCATCTGGC 
ATCAGCAATA 
CGCCTCCATC 
TAGTTTGCGC 
TATGGCTTCA 
GTGCAAAAAA 
AGTGTTATCA 
AAGATGCTTT 
GCGACCGAGT 
TTTAAAAGTG 
GCTGTTGAGA 
TACTTTCACC 



CGCTTCACGA 
GTGAAAACCT 
CCGGGAGCAG 
CCATGACCCA 
GCAGATTGTA 
AGAAAATACC 
GTTCGGCTGC 
TCAGGGGATA 
AAAAAGGCCG 
AATCGACGCT 
CCCCCTGGAA 
TCCGCCTTTC 
AGTTCGGTGT 
GACCGCTGCG 
TCGCCACTGG 
ACAGAGTTCT 
TGCGCTCTGC 
CAAACCACCG 
AAAGGATCTC 
AACTCACGTT 
TTAAATTAAA 
AGTTACCAAT 
ATAGTTGCCT 
CCCAGTGCTG 
AACCAGCCAG 
CAGTCTATTA 
AACGTTGTTG 
TTCAGCTCCG 
GCGGTTAGCT 
CTCATGGTTA 
TCTGTGACTG 
TGCTCTTGCC 
CTCATCATTG 
TCCAGTTCGA 
AGCGTTTCTG 



CCACGCTGAT 
CTGACACATG 
ACAAGCCCGT 
GTCACGTAGC 
CTGAGAGTGC 
GCATCAGGCG 
GGCGAGCGGT 
ACGCAGGAAA 
CGTTGCTGGC 
CAAGTCAGAG 
GCTCCCTCGT 
TCCCTTCGGG 
AGGTCGTTCG 
CCTTATCCGG 
CAGCAGCCAC 
TGAAGTGGTG 
TGAAGCCAGT 
CTGGTAGCGG 
AAGAAGATCC 
AAGGGATTTT 
AATGAAGTTT 
GCTTAATCAG 
GACTCCCCGT 
CAATGATACC 
CCGGAAGGGC 
ATTGTTGCCG 
CCATTGCTGC 
GTTCCCAACG 
CCTTCGGTCC 
TGGCAGCACT 
GTGAGTACTC 
CGGCGTCAAC 
GAAAACGTTC 
TGTAACCCAC 
GGTGAGCAAA 



71 

GAGCTTTACC 
CAGCTCCCGG 
CAGGGCGCGT 
GATAGCGGAG 
ACCATATATG 
CTCTTCCGCT 
ATCAGCTCAC 
GAACATGTGA 
GTTTTTCCAT 
GTGGCGAAAC 
GCGCTCTCCT 
AAGCGTGGCG 
CTCCAAGCTG 
TAACTATCGT 
TGGTAACAGG 
GCCTAACTAC 
TACCTTCGGA 
TGGTTTTTTT 
TTTGATCTTT 
GGTCATGAGA 
TAAATCAATC 
TGAGGCACCT 
CGTGTAGATA 
G CGAG ACCCA 
CGAGCGCAGA 
GGAAGCTAGA 
AGGCATCGTG 
ATCAAGGCGA 
TCCGATCGTT 
GCATAATTCT 
AACCAAGTCA 
ACGGGATAAT 
TTCGGGGCGA 
TCGTGCACCC 
AACAGGAAGG 



GCAGCTGCCT 
AGACGGTCAC 
CAGCGGGTGT 
TGTATACTGG 
CGGTGTGAAA 
TCCTCGCTCA 
TCAAAGGCGG 
GCAAAAGGCC 
AGGCTCCGCC 
CCGACAGGAC 
GTTCCGACCC 
CTTTCTCATA 
GGCTGTGTGC 
CTTGAGTCCA 
ATTAGCAGAG 
GGCTACACTA 
AAAAGAGTTG 
GTTTGCAAGC 
TCTACGGGGT 
TTATCAAAAA 
TAAAGTATAT 
ATCTCAGCGA 
ACTACGATAC 
CGCTCACCGG 
AGTGGTCCTG 
GTAAGTAGTT 
GTGTCACGCT 
GTTACATGAT 
GTCAGAAGTA 
CTTACTGTCA 
TTCTGAGAAT 
ACCGCGCCAC 
AAACTCTCAA 
AACTGATCTT 
CAAAATGCCG 



CGCGCGTTTC 
AGCTTGTCTG 
TGGCGGGTGT 
CTTAACTATG 
TACCGCACAG 
CTGACTCGCT 
TAATACGGTT 
AGCAAAAGGC 
CCCCTGACGA 
TATAAAGATA 
TGCCGCTTAC 
GCTCACGCTG 
ACGAACCCCC 
ACCCGGTAAG 
CGAGGTATGT 
GAAGGACAGT 
GTAGCTCTTG 
AGCAGATTAC 
CTGACGCTCA 
GGATCTTCAC 
ATGAGTAAAC 
TCTGTCTATT 
GGGAGGGCTT 
CTCCAGATTT 
CAACTTTATC 
CGCCAGTTAA 
CGTCGTTTGG 
CCCCCATGTT 
AGTTGGCCGC 
TGCCATCCGT 
AGTGTATGCG 
ATAGCAGAAC 
GGAT CTTACC 
CAGCATCTTT 
CAAAAAAGGG 



4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4740 

4600 

4860 

4920 

4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 

58B0 

5940 

6000 

6060 

6120 
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AATAAGGGCG ACACGGAAAT GTTGAATACT CATACTCTTC CTTTTTCAAT ATTATTGAAG 6180 

CATTTATCAG GGTTATTGTC TCATGAGCGG ATACATATTT GAATG TATTT AGAAAAATAA 624 0 

ACAAATAGGG GTTCCGCGCA CATTTCCCCG AAAAGTGCCA CCTGACGTCT AAGAAACCAT 6300 

TATTATCATG ACATTAACCT ATAAAAATAG GCGTATCACG AGGCCCTTTC GTCTTCAAGA 6360 
A 



(2) INFORMATION FOR SEQ ID NO: 8: 



(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

20 <ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME /KEY : - 
25 <B> LOCATION: 1..48 

(D> OTHER INFORMATION: /note= "oligonucleotide #17422" 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
4 0 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



6361 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
CAATTTGTGT CCCAGAATGT TGCCATCTTC CTTGAAGTCA ATACCTTT 4 8 

(2) INFORMATION FOR SEQ ID NO: 9: 



45 
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(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 1. .47 

(D) OTHER INFORMATION: /note= "oligonucleotide #17423" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
GTCTTGTAGT TGCCGTCATC TTTGAAGAAG ATGCTCCTTT CCTGTAC 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 52 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1 . . 52 

(D) OTHER INFORMATION: /note= "oligonucleotide #17424" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CATGGAACAG GCAGTTTGCC AGTAGTGCAG ATGAACTTCA GGGTAAGTTT TC 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: - 

<B) LOCATION: 1..40 

<D) OTHER INFORMATION: /note* "oligonucleotide #17425" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CTCCACTGAC AGAGAACTTG TGGCCGTTAA CATCACCATC 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 
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(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..47 

(D) OTHER INFORMATION: /note= "oligonucleotide #17426" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
CCATCTTCAA TGTTGTGGCG GGTCTTGAAG TTCACTTTGA TTCCATT 47 

(2) INFORMATION FOR SEQ ID NO: 13: 



(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 41 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

20 (ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME /KEY : - 
25 (B) LOCATION: 1..41 

(D) OTHER INFORMATION: /note= "oligonucleotide #1746 5" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
CGATAAGCTT GAGGATCCTC AGTTGTACAG TTCATCCATG C 41 

(2) INFORMATION FOR SEQ ID NO: 14: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 849 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

40 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

4 5 (ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. .84 9 

(D) OTHER INFORMATION: /note= "pBSGFPsgll " 

50 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
ATGACCATGA TTACGCCAAG CTCGGAATTA ACCCTCACTA AAGGGAACAA AAGCTGGAGC 60 
55 TCCACCGCGG TGGCGGCCGC TCTAGAACTA GTGGATCCCC CGGGCTGCAG GAATTCGATC 120 

GCGCAAGAAA TGGCTAGCAA AGGAGAAGAA CTCTTCACTG GAGTTGTCCC AATTCTTGTT 180 
GAATTAGATG GTGATGTTAA CGGCCACAAG TTCTCTGTCA GTGGAGAGGG TGAAGGTGAT 24 0 

60 

GCAACATACG GAAAACTTAC CCTGAAGTTC ATCTGCACTA CTGGCAAACT GCCTGTTCCA 300 
TGGCCAACAC TTGTCACTAC TCTCTCTTAT GGTGTTCAAT GCTTTTCAAG ATACCCGGAT 3 60 

65 CATATGAAAC GGCATGACTT TTTCAAGAGT GCCATGCCCG AAGGTTATGT ACAGGAAAGG 420 

ACCATCTTCT TCAAAGATGA CGGCAACTAC AAGACACGTG CTGAAGTCAA GTTTGAAGGT 4 80 

GATACCCTTG TTAATAGAAT CGAGTTAAAA GGTATTGACT TCAAGGAAGA TGGCAACATT 54 0 
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75 

CTGGGACACA AATTGGAATA CAACTATAAC TCACACAATG TATACATCAT GGCAGACAAA 
CAAAAGAATG GAATCAAAGT GAACTTCAAG ACCCGCCACA ACATTGAAGA TGGAAGCGTT 
CAACTAGCAG ACCATTATCA ACAAAATACT CCAATTGGCG ATGGCCCTGT CCTTTTACCA 
GACAACCATT ACCTGTCCAC ACAATCTGCC CTTTCGAAAG ATCCCAACGA AAAGAGAGAC 
CACATGGTCC TTCTTGAGTT TGTAACAGCT GCTGGGATTA CACATGGCAT GGATGAACTG 



(2) INFORMATION FOR SEQ ID NO: 15; 



(2) INFORMATION FOR SEQ ID NO:16: 

60 <i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii> MOLECULE TYPE: DNA 
(ix) FEATURE: 



600 
660 
720 
780 



1Q *w*™w*v^4 "uu^iih V.AUATGGCAT GGATGAACTG 840 

TACAACTGA 

849 



(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

25 (ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..720 

(D) OTHER INFORMATION: /note* "SG12" 

30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 
ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 

35 GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 
CTTGTCACTA CTCTCTCTTA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 240 
CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 300 
TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 

45 GTTAATAGAA TCGAGTTAAA AGGTATTGAT TTTAAAGAAG ATGGAAACAT TCTTGGACAC 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 
GGAATCAAAG TTAACTT CAA AATTAGACAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 
GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 
TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 

55 CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT ATACAAATAA 



120 
180 



360 
420 
480 



660 

720 
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(A) NAME /KEY : - 

(B) LOCATION: 1..720 

(D) OTHER INFORMATION: /note= "SG11" 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:16: 
ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 
GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 
GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 
CTTGTCACTA CTCTCTCTTA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 
CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 
TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 
GTTAATAGAA TCGAGTTAAA AGGTATTGAC TTCAAGGAAG ATGGCAACAT TCTGGGACAC 
AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 
GGAATCAAAG TGAACTTCAA GACCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 
GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 
TACCTGTCCA CACAATCTGC CCTTTC G AAA GATCCCAACG AAAAGAGAGA CCACATGGTC 
CTTCTTGAGT TTGTAACAGC TG CTGGGATT ACACATGGCA TGGATGAACT GTACAACTGA 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 
(BJ TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1 . . 720 

(D) OTHER INFORMATION: /note= "SG25" 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 
GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 
GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 
CTAGTCACTA CTCTGTGCTA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 
CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 
TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 
GTTAATAGAA TCGAGTTAAA AGGTATTGAC TTCAAGGAAG ATGGCAACAT TCTGGGACAC 
AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 
GGAATCAAAG TGAACTTCAA GACCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 
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GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 600 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 660 

5 CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT GTACAACTGA 720 

(2) INFORMATION FOR SEQ ID NO: 18: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: linear 



15 



40 



45 



50 



60 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 
20 (A) NAME /KEY: - 

(B) LOCATION: 1. .40 

(D) OTHER INFORMATION: /note= "oligonucleotide #18217" 

25 <xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 

CATTGAACAC CATAGCACAG AGTAGTGACT AGTGTTGGCC 40 

30 (2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 
<B) TYPE: nucleic acid 
35 (O STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1 . . 720 

(D) OTHER INFORMATION: /note= "SB42' 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 

GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGG AG AG G GTGAAGGTGA TGCAACATAC 120 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 

5 5 CTAGTCACTA CTCTCTCTCA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 24 0 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 300 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 360 

GTTAATAGAA TCGAGTTAAA AGGTATTGAT TTTAAAGAAG ATGGAAACAT TCTTGGACAC 42 0 

AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 4 80 

65 GGAATCAAAG TTAACTTCAA AATTAGACAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 54 0 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 600 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 66 0 
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(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 
(B> LOCATION: 1 . .44 
15 (D) OTHER INFORMATION: /note= "oligonucleotide #19059" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
20 CTTCAATGTT GTGGCGGATC TTGAAGTTCG CTTTGATTCC ATTC 44 



(2) INFORMATION FOR SEQ ID NO:23: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

30 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 
35 (A) NAME /KEY: - 

(B) LOCATION: 1. .40 

(D) OTHER INFORMATION: /notes: "oligonucleotide #bio24 M 



40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

CATTGAACAC CATGAGAGAA AGTAGTGACT AGTGTTGGCC 



45 (2) INFORMATION FOR SEQ ID NO:24: 

(l) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 720 base pairs 

(B) TYPE: nucleic acid 
50 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



55 



10 
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(ix) FEATURE: 

(A) NAME /KEY : - 
<B) LOCATION: 1 . . 720 

(D) OTHER INFORMATION: /note= "SB50" 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 

GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 120 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 

15 CTAGTCACTA CTCTCTCTCA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 24 0 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 300 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 36 0 

20 

GTTAATAGAA TCGAGTTAAA AGGTATTGAT TTTAAAGAAG ATGGAAACAT TCTTGGACAC 420 

AAATTGGAAT AC AAC TAT AA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 480 

25 GGAATCAAAG CGAACTTCAA GATCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 54 0 

GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 600 

TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 660 

CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT ATACAAATAA 720 



30 



35 



60 



<2) INFORMATION FOR SEQ ID NO: 25: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1521 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
4 0 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

45 (ix) FEATURE: 

(A) NAME / KEY : - 

(B) LOCATION: 1..1521 

(D) OTHER INFORMATION: /note* "pCMVgfoll" 

50 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 

ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC CAATTCTTGT TGAATTAGAT 60 

55 GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG GTGAAGGTGA TGCAACATAC 12 0 

GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC TGCCTGTTCC ATGGCCAACA 180 

CTTGTCACTA CTCTCTCTTA TGGTGTTCAA TGCTTTTCAA GATACCCGGA TCATATGAAA 24 0 

CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG TACAGGAAAG GACCATCTTC 300 

TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA AGTTTGAAGG TGATACCCTT 360 

65 GTTAATAGAA TCGAGTTAAA AGGTATTGAC TTCAAGGAAG ATGGCAACAT TCTGGGACAC 42 0 

AAATTGGAAT AC AAC TAT AA CTCACACAAT GTATACATCA TGGCAGACAA ACAAAAGAAT 4 80 

GGAATCAAAG TGAACTTCAA GACCCGCCAC AACATTGAAG ATGGAAGCGT TCAACTAGCA 54 0 
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VJMV.k.Al 1 AIL 


AACAAAATAC TCCAATTGGC GATGGCCCTG TCCTTTTACC AGACAACCAT 


600 




CACAATCTGC CCTTTCGAAA GATCCCAACG AAAAGAGAGA CCACATGGTC 


660 


C 7TPTTP 7A r»T 
^ 1 iL i. I L» Avj I 


TTGTAACAGC TGCTGGGATT ACACATGGCA TGGATGAACT GTACAACGGT 


720 




TCGAACAAGA TGGATTGCAC GCAGGTTCTC CGGCCGCTTG GGTGGAGAGG 


780 


^- i AX I CGGCT 


ATGACTGGGC ACAACAGACA ATCGGCTGCT CTGATGCCGC CGTGTTCCGG 


840 


l- 1 GTCAGCG C 


AGGGGCGCCC GGTTCTTTTT GTCAAGACCG ACCTGTCCGG TGCCCTGAAT 


900 


GAACTGC AG G 


ACGAGGCAGC GCGGCTATCG TGGCTGGCCA CGACGGGCGT TCCTTGCGCA 


960 


GCTGTGCTCG 


ACGTTGTCAC TGAAGCGGGA AGGGACTGGC TG CTATTGGG CGAAGTGCCG 


1020 


GGGCAGGATC 


TCCTGTCATC TCACCTTGCT CCTGCCGAGA AAGTATCCAT CATGGCTGAT 


1080 


v»L.AA 1 GuGGC 


GGCTG CATAC GCTTGATCCG GCTACCTGCC CATTCGACCA CCAAGCGAAA 


1140 


UA1 UGCATCG 


AGCGAGCACG TACTCGGATG GAAGCCGGTC TTGTCGATCA GGATGATCTG 


1200 


GACGAAGAGC 


MiuAuvjijG^i ^GCGCCAGCC GAACTGTTCG CCAGGCTCAA GGCGCGCATG 


1260 


CCCGACGGCG 


AGGATCTCGT CGTGACCCAT GGCGATGCCT GCTTGCCGAA TATCATGGTG 


1320 


GAAAATGGCC 


GCTTTTCTGG ATTCATCGAC TGTGGCCGGC TGGGTGTGGC GGACCGCTAT 


1380 


CAGGACATAG 


CGTTGGCTAC CCGTGATATT GCTGAAGAGC TTGGCGGCGA ATGGGCTGAC 


1440 


CGCTTCCTCG 


TGCTTTACGG TATCGCCGCT CCCGATTCGC AGCGCATCGC CTTCTATCGC 


1500 


CTTCTTGACG 


AGTTCTTCTG A 


1521 


(2) INFORMATION FOR SEQ ID NO: 26: 





(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 26: 

Gly Ala Gly Ala 
1 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS- 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 
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(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..32 

(D) OTHER INFORMATION: /note= "primer BioSl" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
CGCGGATCCT TCGAACAAGA TGGATTGCAC GC 32 

(2) INFORMATION FOR SEQ ID NO: 28: 



<i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

20 (ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 
25 (B) LOCATION: 1 . . 34 

(D) OTHER INFORMATION: /note= "primer Bio52 r 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
CCGGAATTCT CAGAAGAACT CGTCAAGAAG GCGA 34 

(2) INFORMATION FOR SEQ ID NO:29: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
40 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

4 5 (ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..46 

(D) OTHER INFORMATION: /note= "primer Bio4 9" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
GGCGCGCAAG AAATGGCTAG CAAAGGAGAA GAACTCTTCA CTGGAG 46 



(2) INFORMATION FOR SEQ ID NO: 30: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 46 base pairs 
60 (B) TYPE: nucleic acid 

<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



65 



(ii) MOLECULE TYPE: DNA 
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(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. .46 

(D) OTHER INFORMATION: /note* "primer Bio50» 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
CCCATCGATA GCACCAGCAC CGTTGTACAG TTCATCCATG CCATGT 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1521 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA 



46 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..1521 

(D) OTHER INFORMATION: /note* "pPGKgfo2S" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
ATGGCTAGCA AAGGAGAAGA ACTCTTCACT GGAGTTGTCC 
GGTGATGTTA ACGGCCACAA GTTCTCTGTC AGTGGAGAGG 
GGAAAACTTA CCCTGAAGTT CATCTGCACT ACTGGCAAAC 
CTAGTCACTA CTCTGTGCTA TGGTGTTCAA TGCTTTTCAA 
CGGCATGACT TTTTCAAGAG TGCCATGCCC GAAGGTTATG 
TTCAAAGATG ACGGCAACTA CAAGACACGT GCTGAAGTCA 
GTTAATAGAA TCGAGTTAAA AGGTATTGAC TTCAAGGAAG 
AAATTGGAAT ACAACTATAA CTCACACAAT GTATACATCA 
GGAATCAAAG TGAACTTCAA GACCCGCCAC AACATTGAAG 
GACCATTATC AACAAAATAC TCCAATTGGC GATGGCCCTG 
TACCTGTCCA CACAATCTGC CCTTTCGAAA GATCCCAACG 
CTTCTTGAGT TTGTAACAGC TGCTGGGATT ACACATGGCA 
GCTGGTGCTA TCGAACAAGA TGGATTGCAC GCAGGTTCTC 
CTATTCGGCT ATGACTGGGC ACAACAGACA ATCGGCTGCT 
CTGTCAGCGC AGGGGCGCCC GGTTCTTTTT GTCAAGACCG 
GAACTGCAGG ACGAGGCAGC GCGGCTATCG TGGCTGGCCA 
GCTGTGCTCG ACGTTGTCAC TGAAGCGGGA AGGGACTGGC 
GGGCAGGATC TCCTGTCATC TCACCTTGCT CCTGCCGAGA 
GCAATGCGGC GGCTGCATAC GCTTGATCCG GCTACCTGCC 
CATCGCATCG AGCGAGCACG TACTCGGATG GAAGCCGGTC 



CAATTCTTGT 
GTGAAGGTGA 
TGCCTGTTCC 
GATACCCGGA 
TACAGGAAAG 
AGTTTGAAGG 
ATGGCAACAT 
TGGCAGACAA 
ATGGAAGCGT 
TCCTTTTACC 
AAAAGAGAGA 
TGGATGAACT 
CGGCCGCTTG 
CTGATGCCGC 
ACCTGTCCGG 
CGACGGGCGT 
TGCTATTGGG 
AAGTATCCAT 
CATTCGACCA 
TTGTCGATCA 



TGAATTAGAT 
TGCAACATAC 
ATGGCCAACA 
TCATATGAAA 
GACCATCTTC 
TGATACCCTT 
TCTGGGACAC 
ACAAAAGAAT 
TCAACTAGCA 
AGACAACCAT 
CCACATGGTC 
GTACAACGGT 
GGTGGAGAGG 
CGTGTTCCGG 
TGCCCTGAAT 
TCCTTGCGCA 
CGAAGTGCCG 
CATGGCTGAT 
CCAAGCGAAA 
GGATGATCTG 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
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10 



15 



30 



35 



45 



65 



84 

GACGAAGAGC ATCAGGGGCT CGCGCCAGCC GAACTGTTCG CCAGGCTCAA GGCGCGCATG 1260 

CCCGACGGCG AGGATCTCGT CGTGACCCAT GGCGATGCCT GCTTGCCGAA TATCATGGTG 1320 

GAAAATGGCC GCTTTTCTGG ATTCATCGAC TGTGGCCGGC TGGGTGTGGC GGACCGCTAT 1380 

CAGGACATAG CGTTGGCTAC CCGTGATATT GCTGAAGAGC TTGGCGGCGA ATGGGCTGAC 1440 

CGCTTCCTCG TGCTTTACGG TATCGCCGCT CCCGATTCGC AGCGCATCGC CTTCTATCGC 1500 
CTTCTTGACG AGTTCTTCTG A 



1521 



(2) INFORMATION FOR SEQ ID NO: 32: 



U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
20 (D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 

25 (ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..26 

(D) OTHER INFORMATION: /note= "oligonucleotide #18990' 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 
GACCGGGACA CGTATCCAGC CTCCGC 26 

(2) INFORMATION FOR SEQ ID NO: 33: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 28 base pairs 
4 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..28 

50 (D) OTHER INFORMATION: /note= "oligonucleotide H18991" 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
55 GGAGGCTGGA TACGTGTCCC GGTCTGCA 2 8 

(2) INFORMATION FOR SEQ ID NO: 34: 

60 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7617 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 



WO 97/42320 



PCT/US97/07625 



B5 



(A) NAME /KEY : - 

(B) LOCATION: 1..7617 

<D) OTHER INFORMATION: /note= "pGen- PGKgf O25R0" 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 
TCGAGGTCGA CGGTATCGAT TAGTCCAATT TGTTAAAGAC AGGATATCAG TGGTCCAGGC 
TCTAGTTTTG ACTCAACAAT ATCACCAGCT GAAGCCTATA GAGTACGAGC CATAGATAAA 
ATAAAAGATT TTATTTAGTC TCCAGAAAAA GGGGGGAATG AAAGACCCCA CCTGTAGGTT 
TGGCAAGCTA GCTTAAGTAA CGCCATTTTG CAAGGCATGG AAAAATACAT AACTGAGAAT 
AGAGAAGTTC AGATCAAGGT CAGGAACAGA TGGAACAGCT GAATATGGGC CAAACAGGAT 
ATCTGTGGTA AGCAGTTCCT GCCCCGGCTC AGGGCCAAGA ACAGATGGAA CAGCTGAATA 
TGGGCCAAAC AGGATATCTG TGGTAAGCAG TTCCTGCCCC GGCTCAGGGC CAAGAACAGA 
TGGTCCCCAG ATGCGGTCCA GCCCTCAGCA GTTTCTAGAG AACCATCAGA TGTTTCCAGG 
GTGCCCCAAG GACCTGAAAT GACCCTGTGC CTTATTTGAA CTAACCAATC AGTTCGCTTC 
TCGCTTCTGT TCGCGCGCTT CTGCTCCCCG AG CTCAATAA AAGAGCCCAC AACCCCTCAC 
TCGGGGCGCC AG TCCTC CGA TTGACTGAGT CGCCCGGGTA CCCGTGTATC CAATAAACCC 
TCTTGCAGTT GCATCCGACT TGTGGTCTCG CTGTTCCTTG GGAGGGTCTC CTCTGAGTGA 
TTGACTACCC GTCAGCGGGG GTCTTTCATT TGGGGGCTCG TCCGGGATCG GGAGACCCCT 
GCCCAGGGAC CACCGACCCA CCACCGGGAG GTAAGCTGGC CAGCAACTTA TCTGTGTCTG 
TCCGATTGTC TAGTGTCTAT GACTGATTTT ATGCGCCTGC GTCGGTACTA GTTAGCTAAC 
TAGCTCTGTA TCTGGCGGAC CCGTGGTGGA ACTGACGAGT TCGGAACACC CGGCCGCAAC 
CCTGGGAGAC GTCCCAGGGA CTTCGGGGGC CGTTTTTGTG GCCCGACCTG AGTCCAAAAA 
TCCCGATCGT TTTGGACTCT TTGGTGCACC CCCCTTAGAG GAGGGATATG TGGTTCTGGT 
AGGAGACGAG AACCTAAAAC AGTTCCCGCC TCCGTCTGAA TTTTTGCTTT CGGTTTGGGA 
CCGAAGCCGC GCCGCGCGTC TTG TCTGCTG CAGCATCGTT CTGTGTTGTC TCTGTCTGAC 
TGTGTTTCTG TATTTGTCTG AGAATATGGG CCAGACTGTT ACCACTCCCT TAAGTTTGAC 
CTTAGGTCAC TGGAAAGATG TCGAGCGGAT CGCTCACAAC CAGTCGGTAG ATGTCAAGAA 
GAGACGTTGG GTTACCTTCT GCTCTGCAGA ATGGCCAACC TTTAACGTCG GATGGCCGCG 
AGACGGCACC TTTAACCGAG ACCTCATCAC CCAGGTTAAG ATCAAGGTCT TTTCACCTGG 
CCCGCATGGA CACCCAGACC AGGTCCCCTA CATCGTGACC TGGGAAGCCT TGGCTTTTGA 
CCCCCCTCCC TGGGTCAAGC CCTTTGTACA CCCTAAGCCT CCGCCTCCTC TTCCTCCATC 
CGCCCCGTCT CTCCCCCTTG AACCTCCTCG TTCGACCCCG CCTCGATCCT CCCTTTATCC 
AGCCCTCACT CCTTCTCGAC GGTATACAGA CATGATAAGA TACATTGATG AGTTTGGACA 
AACCACAACT AGAATGCAGT GAAAAAAATG CTTTATTTGT GAAATTTGTG ATGCTATTGC 
TTTATTTGTA ACCATTATAA GCTGCAATAA ACAAGTTGGG GTGGGCGAAG AACTCCAGCA 
TGAGATCCCC GCGCTGGAGG ATCATCCAGC CGGCGAACGT GGCGAGAAAG GAAGGGAAGA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
96 0 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
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AAGCGAAAGG AGCGGGCGCT AGGGCGCTGG 
CCACACCCGC CGCGCTTAAT GCGCCGCTAC 
5 AGCTGGTTCT TTCCGCCTCA GAAGCCATAG 

TGTCTTCCCA ATCCTCCCCC TTGCTGTCCT 
ACCTACTCAG ACAATGCGAT GCAATTTCCT 

10 

CACCTTCCAG GGTCAAGGAA GGCACGGGGG 
AAGGCACAGT CGAGGCTGAT CAGCGAGCTC 
15 CCTCTAGATG CATGCTCGAG CGGCCGCCAG 

AACTCGTCAA GAAGGCGATA GAAGGCGATG 
AGCACGAGGA AGCGGTCAGC CCATTCGCCG 

20 

AACGCTATGT CCTGATAGCG GTCCGCCACA 
AAGCGGCCAT TTTCCACCAT GATATTCGGC 
25 TCCTCGCCGT CGGGCATGCG CGCCTTGAGC 

TGATGCTCTT CGTCCAGATC ATCCTGATCG 
CGCTCGATGC GATGTTTCGC TTGGTGGTCG 

30 

AGCCGCCGCA TTGCATCAGC CATGATGGAT 
AGGAGATCCT GCCCCGGCAC TTCGCCCAAT 
35 ACGTCGAGCA CAGCTGCGCA AGGAACGCCC 

TCGTCCTGCA GTTCATTCAG GGCACCGGAC 
CCCTGCGCTG ACAGCCGGAA CACGGCGGCA 

40 

TCATAGCCGA ATAGCCTCTC CACCCAAGCG 
TCGATAGCAC CAGCACCGTT GTACAGTTCA 
45 ACAAACTCAA GAAGGACCAT GTGGTCTCTC 

TGTGTGGACA GGTAATGGTT GTCTGGTAAA 
TGTTGATAAT GGTCTGCTAG TTGAACGCTT 

50 

TTCACTTTGA TTCCATTCTT TTGTTTGTCT 
TTGTATTCCA ATTTGTGTCC CAGAATGTTG 
55 TCGATTCTAT TAACAAGGGT ATCACCTTCA 

CCGTCATCTT TGAAGAAGAT GGTCCTTTCC 
AAAAAGTCAT GCCGTTTCAT ATGATCCGGG 

60 

AGAGTAGTGA CTAGTGTTGG CCATGGAACA 
AGGGTAAGTT TTCCGTATGT TGCATCACCT 
65 CCGTTAACAT CACCATCTAA TTCAACAAGA 

CCTTTGCTAG CCATTTCTTG CGCGCCCGCG 
CGAAAGGCCC GGAGATGAGG AAGAGGAGAA 



PCIYUS97/07625 



86 



CAAGTGTAGC 


GGTCACGCTG 


CGCGTAACCA 


1920 


AGGGCGCGTG 


GGGATACCCC 


CTAGAGCCCC 


1980 


AGCCCACCGC 


ATCCCCAGCA 


TGCCTGCTAT 


2040 


GCCCCACCCC 


ACCCCCCAGA 


ATAGAATGAC 


2100 


CATTTTATTA 


GGAAAGGACA 


GTGGGAGTGG 


2160 


AGGGGCAAAC 


AACAGATGGC 


TGGCAACTAG 


2220 


TAGCATTTAG 


GTGACACTAT 


AGAATAGGGC 


2280 


TGTGATGGAT 


ATCTGCAGAA 


TTCTCAGAAG 


2340 


CGCTGCGAAT 


CGGGAGCGGC 


GATACCGTAA 


2400 


CCAAGCTCTT 


C AG C AATATC 


ACGGGTAGCC 


2460 


CCCAGCCGGC 


CACAGTCGAT 


GAATCCAGAA 


2520 


AAGCAGGCAT 


CGCCATGGGT 


CACGACGAGA 


2580 


CTGGCGAACA 


GTTCGGCTGG 


CGCGAGCCCC 


2640 


ACAAGACCGG 


CTTCCATCCG 


AGTACGTGCT 


2700 


AATGGGCAGG 


TAGCCGGATC 


AAGCGTATGC 


2760 


ACTTTCTCGG 


CAGGAGCAAG 


GTGAGATGAC 


2820 


AGCAGCCAGT 


CCCTTCCCGC 


TTCAGTGACA 


2880 


GTCGTGGCCA 


GCCACGATAG 


CCGCGCTGCC 


2940 


AGGTCGGTCT 


TGACAAAAAG 


AACCGGGCGC 


3000 


TCAGAGCAGC 


CGATTGTCTG 


TTGTGCCCAG 


3060 


GCCGGAGAAC 


CTGCGTGCAA 


TCCATCTTGT 


3120 


TCCATGCCAT 


GTGTAATCCC 


AGCAGCTGTT 


3180 


TTTTCGTTGG 


GATCTTTCGA 


AAGGG CAGAT 


3240 


AGGACAGGGC 


CATCGCCAAT 


TGGAGTATTT 


3300 


CCATCTTCAA 


TGTTGTGGCG 


GGTCTTGAAG 


3360 


GCCATGATGT 


ATACATTGTG 


TGAGTTATAG 


3420 


CCATCTTCCT 


TGAAGTCAAT 


ACCTTTTAAC 


3480 


AACTTGACTT 


CAGCACGTGT 


CTTGTAGTTG 


3540 


TGTACATAAC 


CTTCGGGCAT 


GGCACTCTTG 


3600 


TATCTTGAAA 


AGCATTGAAC 


ACCATAGCAC 


3660 


GGCAGTTTGC 


CAGTAGTGCA 


GATGAACTTC 


3720 


TCACCCTCTC 


CACTGACAGA 


GAACTTGTGG 


3780 


ATTGGGACAA 


CTCCAGTGAA 


GAGTTCTTCT 


3840 


GAGGCTGGAT 


ACGTGTCCCG 


GTCTGCAGGT 


3900 


CAGCGCGGCA 


GACGTGCGCT 


TTTGAAGCGT 


3960 
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GCAGAATGCC GGGCTCCGGA GGACCTTCGC GCCCGCCCCG CCCCTGAGCC CGCCCCTGAG 4020 

CCCGCCCCCG GACCCACCCC TTCCCAGCCT CTGAGCCCAG AAAGCGAAGG AGCCAAGCTG 4080 

CTATTGGCCG CTGCCCCAAA GGCCTACCCG CTTCCATTGC TCAGCGGTGC TGTCCATCTG 414 0 

CACGAGACTA GTGAGACGTG CTACTTCCAT TTGTCACGTC CTGCACGACG CGAGCTGCGG 42 00 

GGCGGGGGGG AACTTCCTGA CTAGGGGAGG AGTAGAAGGT GGCGCGAAGG GGCCACCAAA 4260 

GAAGGGAGCC GGTTGGCGCT ACCGGTGGAT GTGGAATGTG TGCGAGGCCA GAGGCCACTT 4320 

GTGTAGCGCC AAGTGCCAGC GGGGCTGCTA AAGCGCATGC TCCAGACTGC CTTGGGAAAA 4380 

GCGCCTCCCC TACCCGGTAG AATTCGATAT CAAGCTTATC GATACCGTCG AGATCTCCCG 4440 

ATCCGTCGAG GTCGACGGTA TCGATTAGTC CAATTTGTTA AAGACAGGAT ATCAGTGGTC 4 500 

CAGGCTCTAG TTTTGACTCA ACAATATCAC CAGCTGAAGC CTATAGAGTA CGAGCCATAG 4560 

ATAAAATAAA AGATTTTATT TAGTCTCCAG AAAAAGGGGG GAATGAAAGA CCCCACCTGT 4620 

AGGTTTGGCA AG CTAGCTT A AGTAACG CC A TTTTGCAAGG CATGGAAAAA TACATAACTG 4680 

AGAATAGAGA AGTTCAGATC GGGATCCCAA TTCTTTCGGA CTTTTGAAAG TGATGGTGGT 4 74 0 

GGGGGAAGGA TTCGAACCTT CGAAGTCGAT GACGGCAGAT TTAGAGTCTG CTCCCTTTGG 4 800 

CCGCTCGGGA ACCCCACCAC GGGTAATGCT TTTACTGGCC TGCTCCCTTA TCGGGAAGCG 4860 

GGGCGCATCA TATCAAATGA CGCGCCGCTG TAAAGTGTTA CGTTGAGAAA GAATTGGGAT 4920 

CCCGATCAAG GTCAGGAACA GATGGAACAG CTAGAGAACC ATCAGATGTT TCCAGGGTGC 4 980 

CCCAAGGACC TGAAATGACC CTGTGCCTTA TTTGAACTAA CCAATCAGTT CGCTTCTCGC 5040 

TTCTGTTCGC GCGCTTCTGC TCCCCGAGCT CAATAAAAGA GCCCACAACC CCTCACTCGG 5100 

GGCGCCAGTC CTCCGATTGA CTGAGTCGCC CGGGTACCCG TGTATCCAAT AAACCCTCTT 5160 

GCAGTTGCAT CCGACTTGTG GTCTCGCTGT TCCTTGGGAG GGTCTCCTCT GAGTGATTGA 5220 

CTACCCGTCA GCGGGGGTCT TTCACCCAGA GTTTGGAACT TACTGTCTTC TTGGGACCTG 5280 

CAGCCCGGGG GATCCACTAG TTCTAGAGCG GCCGCCACCG CGGTGGATTC TGCCTCGCGC 534 0 

G TTTCGGTG A TGACGGTGAA AACCTCTGAC ACATGCAGCT CCCGGAGACG GTCACAGCTT 5400 

GTCTGTAAGC GGATGCCGGG AGCAGACAAG CCCGTCAGGG CGCGTCAGCG GGTGTTGGCG 5460 

GGTGTCGGGG CGCAGCCATG ACCCAGTCAC GTAGCGATAG CGGAGTGTAT ACTGGCTTAA 5520 

CTATGCGGCA TCAGAGCAGA TTGTACTGAG AGTGCACCAT ATGCGGTGTG AAATACCGCA 5580 

CAGATGCGTA AGGAGAAAAT ACCGCATCAG GCGCTCTTCC GCTTCCTCGC TCACTGACTC 5640 

GCTGCGCTCG GTCGTTCGGC TGCGGCGAGC GGTATCAGCT CACTCAAAGG CGGTAATACG 5700 

GTTATCCACA GAATCAGGGG ATAACGCAGG AAAGAACATG TGAGCAAAAG GCCAGCAAAA 5760 

GGCCAGGAAC CGTAAAAAGG CCGCGTTGCT GGCGTTTTTC CATAGGCTCC GCCCCCCTGA 5820 

CGAGCATCAC AAAAATCGAC GCTCAAGTCA GAGGTGGCGA AACCCGACAG GACTATAAAG 5880 

ATACCAGGCG TTTCCCCCTG GAAGCTCCCT CGTGCGCTCT CCTGTTCCGA CCCTGCCGCT 5940 

TACCGGATAC CTGTCCGCCT TTCTCCCTTC GGGAAGCGTG GCGCTTTCTC AATGCTCACG 6000 

CTGTAGGTAT CTCAGTTCGG TGTAGGTCGT TCGCTCCAAG CTGGGCTGTG TGCACGAACC 6060 



WO 97/42320 

CCCCGTTCAG CCCGACCGCT GCGCCTTATC 
AAGACACGAC TTATCGCCAC TGGCAGCAGC 
5 TGTAGGCGGT GCTACAGAGT TCTTGAAGTG 

AGTATTTGGT ATCTGCGCTC TGCTGAAGCC 
TTGATCCGGC AAACAAACCA CCGCTGGTAG 

10 

TACGCGCAGA AAAAAAGGAT CTCAAGAAGA 
TCAGTGGAAC GAAAACTCAC GTTAAGGGAT 
15 CACCTAGATC CTTTTAAATT AAAAATGAAG 

AACTTGGTCT GACAGTTACC AATGCTTAAT 
ATTTCGTTCA TCCATAGTTG CCTGACTCCC 

20 

CTTACCATCT GGCCCCAGTG CTGCAATGAT 
TTTATCAGCA ATAAACCAGC CAGCCGGAAG 
25 ATCCGCCTCC ATCCAGTCTA TTAATTGTTG 

TAATAGTTTG CGCAACGTTG TTGCCATTGC 
TGGTATGGCT TCATTCAGCT CCGGTTCCCA 

30 

GTTGTGCAAA AAAGCGGTTA GCTCCTTCGG 
CGCAGTGTTA TCACTCATGG TTATGGCAGC 
35 CGTAAGATGC TTTTCTGTGA CTGGTGAGTA 

GCGGCGACCG AGTTGCTCTT GCCCGGCGTC 
AACTTTAAAA GTGCTCATCA TTGGAAAACG 

40 

ACCGCTGTTG AGATCCAGTT CGATGTAACC 
TTTTACTTTC ACCAGCGTTT CTGGGTGAGC 
4 5 GGGAATAAGG GCGACACGGA AATGTTGAAT 

AAGCATTTAT CAGGGTTATT GTCTCATGAG 
TAAACAAATA GGGGTTCCGC GCACATTTCC 

50 

CATTATTATC ATGACATTAA CCTATAAAAA 



PCT/US97/07625 
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CGGTAACTAT 


CGTCTTGAGT 


CCAACCCGGT 


6120 


CACTGGTAAC 


AGGATTAGCA 


GAGCGAGGTA 


6180 


GTGGCCTAAC 


TACGGCTACA 


CTAGAAGGAC 


6240 


AGTTACCTTC 


GGAAAAAGAG 


TTGGTAGCTC 


6300 


CGGTGGTTTT 


TTTGTTTGCA 


AGCAGCAGAT 


6360 


TCCTTTGATC 


TTTTCTACGG 


GGTCTGACGC 


6420 


TTTGGTCATG 


AGATTATCAA 


AAAGGATCTT 


6480 


TTTTAAATCA 


ATCTAAAGTA 


TATATGAGTA 


6540 


CAGTGAGGCA 


CCTATCTCAG 


CGATCTGTCT 


6600 


CGTCGTGTAG 


ATAACTACGA 


TACGGGAGGG 


6660 


ACCGCGAGAC 


CCACGCTCAC 


CGGCTCCAGA 


6720 


GGCCGAGCGC 


AGAAGTGGTC 


CTGCAACTTT 


6780 


CCGGGAAGCT 


AGAGTAAGTA 


GTTCGCCAGT 


6840 


TGCAGGCATC 


GTGGTGTCAC 


GCTCGTCGTT 


6900 


ACGATCAAGG 


CGAGTTACAT 


GATCCCCCAT 


6960 


TCCTCCGATC 


GTTGTCAGAA 


GTAAGTTGGC 


7020 


ACTGCATAAT 


TCTCTTACTG 


TCATGCCATC 


7080 


CTCAACCAAG 


TCATTCTGAG 


AATAGTGTAT 


7140 


AACACGGGAT 


AATACCGCGC 


CACATAGCAG 


7200 


TTCTTCGGGG 


CGAAAACTCT 


CAAGGATCTT 


7260 


CACTCGTGCA 


CCCAACTGAT 


CTTCAGCATC 


7320 


AAAAACAGGA 


AGGCAAAATG 


CCGCAAAAAA 


7380 


ACTCATACTC 


TTCCTTTTTC 


AATATTATTG 


7440 


CGGATACATA 


TTTGAATGTA 


TTTAGAAAAA 


7500 


CCGAAAAGTG 


CCACCTGACG 


TCTAAGAAAC 


7560 


TAGGCGTATC 


ACGAGGCCCT 


TTCGTCT 


7617 
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(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15581 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

10 

(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1. .15581 

15 <D) OTHER INFORMATION: /note- "pNLnSGll" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 



20 


TGGAAGGGCT 


AATTTGGTCC 


CAAAAAAGAC 


AAGAGATCCT 


TGATCTGTGG 


AT CTACCACA 


60 




CACAAGGCTA 


CTTCCCTGAT 


TGGCAGAACT 


ACACACCAGG 


GCCAGGGATC 


AGATATCCAC 


120 


25 


TGACCTTTGG 


ATGGTGCTTC 


AAGTTAGTAC 


CAGTTGAACC 


AGAGCAAGTA 


GAAGAGGCCA 


180 


AATAAGGAGA 


GAAGAACAGC 


TTGTTACACC 


CTATGAGCCA 


GCATGGGATG 


GAGGACCCGG 


240 




AGGGAGAAGT 


ATTAGTGTGG 


AAGTTTGACA 


GCCTCCTAGC 


ATTTCGTCAC 


ATGGCCCGAG 


300 


30 


AGCTGCATCC 


GGAGTACTAC 


AAAGACTGCT 


GACATCGAGC 


TTTCTACAAG 


GGACTTTCCG 


360 




CTGGGGACTT 


TCCAGGGAGG 


TGTGGCCTGG 


GCGGGACTGG 


GGAGTGGCGA 


GCCCTCAGAT 


420 


35 


GCTACATATA 


AGCAGCTGCT 


TTTTGCCTGT 


ACTGGGTCTC 


TCTGGTTAGA 


CCAGATCTGA 


480 


GCCTGGGAGC 


TCTCTGGCTA ACTAGGGAAC CCACTGCTTA AGCCTCAATA AAGCTTGCCT 


540 




TGAGTGCTCA 


AAGTAGTGTG 


TGCCCGTCTG 


TTGTGTGACT 


CTGGTAACTA 


GAGATCCCTC 


600 


40 


AGACCCTTTT 


AGTCAGTGTG 


GAAAATCTCT 


AGCAGTGGCG 


CCCGAACAGG 


GACTTGAAAG 


660 




CGAAAGTAAA 


GCCAGAGGAG 


ATCTCTCGAC 


GCAGGACTCG 


GCTTGCTGAA 


GCGCGCACGG 


720 


45 


CAAGAGGCGA 


GGGGCGGCGA 


CTGGTGAGTA 


CGCCAAAAAT 


TTTGACTAGC 


GGAGGCTAGA 


780 


AGGAGAGAGA 


TGGGTGCGAG 


AGCGTCGGTA 


TTAAGCGGGG 


GAGAATTAGA 


TAAATGGGAA 


840 




AAAATTCGGT 


T AAGGC CAGG 


GGGAAAGAAA 


CAATATAAAC 


TAAAACATAT 


AGTATGGG C A 


900 


50 


AGCAGGGAGC 


TAGAACGATT 


CGCAGTTAAT 


CCTGGCCTTT 


TAGAGACATC 


AGAAGGCTGT 


960 




AGACAAATAC 


TGGGACAGCT 


ACAACCATCC 


CTTCAGACAG 


GATCAGAAGA 


ACTTAGATCA 


1020 


55 


TTATATAATA 


CAATAGCAGT 


CCTCTATTGT 


GTGC AT CAAA 


GGATAGATGT 


AAAAGACACC 


1080 


AAGGAAGCCT TAGATAAGAT AGAGGAAGAG 


CAAAACAAAA 


GTAAGAAAAA 


GGCACAGCAA 


1140 




GCAGCAGCTG 


ACACAGGAAA 


CAACAGCCAG 


GTCAGCCAAA 


ATTACCCTAT 


AGTGCAGAAC 


1200 


60 


CTCCAGGGGC 


AAATGGTACA 


TCAGGCCATA 


TCACCTAGAA 


CTTTAAATGC 


ATGGGTAAAA 


1260 




GTAGTAGAAG 


AGAAGGCTTT 


CAGCCCAGAA GTAATACCCA TGTTTTC AG C 


ATTATCAGAA 


1320 


65 


GGAGCCACCC 


CACAAGATTT 


AAATACCATG 


CTAAACACAG 


TGGGGGGACA 


TCAAGCAGCC 


1380 


ATGCAAATGT 


TAAAAGAGAC 


CATCAATGAG 


GAAGCTGCAG 


AATGGGATAG 


ATTGCATCCA 


1440 




GTGCATGCAG 


GGCCTATTGC 


ACCAGGCCAG 


ATGAGAGAAC 


CAAGGGGAAG 


TGACATAGCA 


1500 
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GGAACTACTA GTACCCTTCA GGAACAAATA 
GTAGGAGAAA TCTATAAAAG ATGGATAATC 
5 AGCCCTACCA GCATTCTGGA CATAAGACAA 

GACCGATTCT ATAAAACTCT AAGAGCCGAG 
ACAGAAACCT TGTTGGTCCA AAATGCGAAC 

10 

GGACCAGGAG CGACACTAGA AGAAATGATG 
CATAAAGCAA GAGTTTTGGC TGAAGCAATG 
15 ATACAGAAAG GCAATTTTAG GAACCAAAGA 

GAAGGGCACA TAGCCAAAAA TTGCAGGGCC 
AAGGAAGGAC ACCAAATGAA AGATTGTACT 

20 

TGGCCTTCCC ACAAGGGAAG GCCAGGGAAT 
CCACCAGAAG AGAGCTTCAG GTTTGGGGAA 
25 CCGATAGACA AGGAACTGTA TCCTTTAGCT 

TCGTCACAAT AAAGATAGGG GGGCAATTAA 
ATACAGTATT AGAAGAAATG AATTTGCCAG 

30 

TTGGAGGTTT TATCAAAGTA GGACAGTATG 
AAGCTATAGG TACAGTATTA GTAGGACCTA 
35 TGACTCAGAT TGGCTGCACT TTAAATTTTC 

AATTAAAGCC AGGAATGGAT GGCCCAAAAG 
TAAAAGCATT AGTAGAAATT TGTACAGAAA 

40 

GGCCTGAAAA TCCATACAAT ACTCCAGTAT 
GGAGAAAATT AGTAGATTTC AGAGAACTTA 
45 AATTAGGAAT ACCACATCCT GCAGGGTTAA 

TGGGCGATGC ATATTTTTCA GTTCCCTTAG 
CCATACCTAG TATAAACAAT GAGACACCAG 

50 

AGGGATGGAA AGGATCACCA GCAATATTCC 
TTAGAAAACA AAATCCAGAC ATAGTCATCT 
55 CTGACTTAGA AATAGGGCAG CATAGAACAA 

GGTGGGGATT TACCACACCA GACAAAAAAC 
GTTATGAACT CCATCCTGAT AAATGGACAG 

60 

GCTGGACTGT CAATGACATA CAGAAATTAG 
ATGCAGGGAT TAAAGTAAGG CAATTATGTA 
65 AAGTAGTACC ACTAACAGAA GAAGCAGAGC 

AAGAACCGGT ACATGGAGTG T ATTATG AC C 
AGCAGGGGCA AGGCCAATGG ACATATCAAA 
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GGATGGATGA 


CACATAATCC 


ACCTATCCCA 


1560 


CTGGGATTAA 


ATAAAATAGT 


AAGAATGTAT 


1620 


GGACCAAAGG 


AACCCTTTAG 


AGACTATGTA 


1660 


CAAGCTTCAC 


AAGAGGTAAA 


AAATTGGATG 


1740 


CCAGATTGTA 


AGACTATTTT 


AAAAGCATTG 


1800 


ACAGCATGTC 


AGGGAGTGGG 


GGGACCCGGC 


1860 


AGCCAAGTAA 


CAAATCCAGC 


TACCATAATG 


1920 


AAGACTGTTA 


AGTGTTTCAA 


TTGTGGCAAA 


1980 


CCTAGGAAAA 


AGGGCTGTTG 


GAAATGTGGA 


2040 


GAGAGACAGG 


CTAATTTTTT 


AGGGAAGATC 


2100 


TTTCTTCAGA 


GCAGACCAGA 


GCCAACAGCC 


2160 


GAGACAACAA 


CTCCCTCTCA 


GAAGCAGGAG 


2220 


TCCCTCAGAT 


CACTCTTTGG 


CAGCGACCCC 


2280 


AGGAAGCTCT 


ATTAGATACA 


GGAGCAGATG 


2340 


GAAGATGGAA 


ACCAAAAATG 


ATAGGGGGAA 


2400 


ATCAGATACT 


CATAGAAATC 


TGCGGACATA 


2460 


CACCTGTCAA 


CATAATTGGA 


AGAAATCTGT 


2520 


CCATTAGTCC 


TATTGAGACT 


GTACCAGTAA 


2580 


TTAAACAATG 


GCCATTGACA 


GAAGAAAAAA 


2640 


TGGAAAAGGA 


AGGAAAAATT 


TCAAAAATTG 


2700 


TTG CC AT AAA 


GAAAAAAGAC 


AGTACTAAAT 


2760 


ATAAGAGAAC 


TCAAGATTTC 


TGGGAAGTTC 


2820 


AACAGAAAAA 


ATCAGTAACA 


GTACTGGATG 


2880 


ATAAAGACTT 


CAGGAAGTAT 


ACTGCATTTA 


2940 


GGATTAGATA 


TCAGTACAAT 


GTGCTTCCAC 


3000 


AGTGTAGCAT 


GACAAAAATC 


TTAGAGCCTT 


3060 


ATCAATACAT 


GGATGATTTG 


TATGTAGGAT 


3120 


AAATAGAGGA 


ACTGAGACAA 


CATCTGTTGA 


3180 


ATCAGAAAGA 


ACCTCCATTC 


CTTTGGATGG 


3240 


TACAGCCTAT 


AGTGCTGCCA 


GAAAAGGACA 


3300 


TGGGAAAATT 


GAATTGGGCA 


AG TC AG ATTT 


3360 


AACTTCTTAG 


GGGAACCAAA 


GCACTAACAG 


3420 


TAGAACTGGC 


AGAAAACAGG 


GAGATTCTAA 


3480 


CATCAAAAGA 


CTTAATAGCA 


GAAATACAGA 


3S40 


TTTATCAAGA 


GCCATTTAAA 


AATCTGAAAA 


3600 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



CAGGAAAATA TGCAAGAATG AAGGGTGCCC ACACTAATGA TGTGAAACAA TTAACAGAGG 
CAGTACAAAA AATAGCCACA GAAAGCATAG TAATATGGGG AAAGACTCCT AAATTTAAAT 
TACCCATACA AAAGGAAACA TGGGAAGCAT GGTGGACAGA GTATTGGCAA GCCACCTGGA 
TTCCTGAGTG GGAGTTTGTC AATACCCCTC CCTTAGTGAA GTTATGGTAC CAGTTAGAGA 
AAGAACCCAT AATAGGAGCA GAAACTTTCT ATGTAGATGG GGCAGCCAAT AGGGAAACTA 
AATTAGGAAA AGCAGGATAT GTAACTGACA GAGGAAGACA AAAAGTTGTC CCCCTAACGG 
ACACAACAAA TCAGAAGACT GAGTTACAAG CAATTCATCT AGCTTTGCAG GATTCGGGAT 
TAGAAGTAAA CATAGTGACA GACTCACAAT ATG CATTGGG AATCATTCAA GCACAACCAG 
ATAAGAGTGA ATCAGAGTTA GTCAGTCAAA TAATAGAGCA GTTAATAAAA AAGGAAAAAG 
TCTACCTGGC ATGGGTACCA GCACACAAAG GAATTGGAGG AAATGAACAA GTAGATGGGT 
TGGTCAGTGC TGGAATCAGG AAAGTACTAT TTTTAGATGG AATAGATAAG GCCCAAGAAG 
AACATGAGAA ATATCACAGT AATTGGAGAG CAATGGCTAG TGATTTTAAC CTACCACCTG 
TAGTAGCAAA AGAAATAGTA GCCAGCTGTG ATAAATGTCA GCTAAAAGGG GAAGCCATGC 
ATGGACAAGT AGACTGTAGC CCAGGAATAT GGC AG CTAGA TTGTACACAT TTAGAAGGAA 
AAGTTATCTT GGTAGCAGTT CATGTAGCCA GTGGATATAT AGAAGCAGAA GTAATTCCAG 
CAGAGACAGG GCAAGAAACA GCATACTTCC TCTTAAAATT AGCAGGAAGA TGGCCAGTAA 
AAACAGTACA TACAGACAAT GGCAGCAATT TCACCAGTAC TACAGTTAAG GCCGCCTGTT 
GGTGGGCGGG GATCAAGCAG GAATTTGGCA TTCCCTACAA TCCCCAAAGT CAAGGAGTAA 
TAGAATCTAT GAATAAAGAA TTAAAGAAAA TTATAGGACA GGTAAGAGAT CAGGCTGAAC 
ATCTTAAGAC AGCAGTACAA ATGGCAGTAT TCATCCACAA TTTTAAAAGA AAAGGGGGGA 
TTGGGGGGTA CAGTGCAGGG GAAAGAATAG TAGACATAAT AGCAACAGAC ATACAAACTA 
AAGAATTACA AAAACAAATT ACAAAAATTC AAAATTTTCG GGTTTATTAC AGGGACAGCA 
GAGATCCAGT TTGGAAAGGA CCAGCAAAGC TCCTCTGGAA AGGTGAAGGG GCAGTAGTAA 
TACAAGATAA TAGTGACATA AAAGTAGTGC CAAGAAGAAA AGCAAAGATC ATCAGGGATT 
ATGGAAAACA GATGGCAGGT GATGATTGTG TGGCAAGTAG ACAGGATGAG GATTAACACA 
TGGAAAAGAT TAGTAAAACA CCATATGTAT ATTTCAAGGA AAGCTAAGGA CTGGTTTTAT 
AGACATCACT ATGAAAGTAC TAATCCAAAA ATAAGTTCAG AAGTACACAT CCCACTAGGG 
GATGCTAAAT TAGTAATAAC AACATATTGG GGTCTGCATA CAGGAGAAAG AGACTGGCAT 
TTGGGTCAGG GAGTCTCCAT AGAATGGAGG AAAAAGAGAT ATAGCACACA AGTAGACCCT 
GACCTAGCAG ACCAACTAAT TCATCTGCAC TATTTTGATT GTTTTTCAGA ATCTGCTATA 
AGAAATACCA TATTAGGACG TATAGTTAGT CCTAGGTGTG AATATCAAGC AGGACATAAC 
AAGGTAGGAT CTCTACAGTA CTTGGCACTA GCAGCATTAA TAAAACCAAA ACAGATAAAG 
CCACCTTTGC CTAGTGTTAG GAAACTGACA GAGGACAGAT GGAACAAGCC CCAGAAGACC 
AAGGGCCACA GAGGGAGCCA TACAATGAAT GGACACTAGA GCTTTTAGAG GAACTTAAGA 
GTGAAGCTGT TAGACATTTT CCTAGGATAT GGCTCCATAA CTTAGGACAA CATATCTATG 
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AAACTTACGG GGATACTTGG GCAGGAGTGG 
TGTTTATCCA TTTCAGAATT GGGTGTCGAC 
5 GAGCAAGAAA TGGAGCCAGT AGATCCTAGA 

CCTAAAACTG CTTGTACCAA TTGCTATTGT 
TTCATGACAA AAGCCTTAGG CATCTCCTAT 

10 

GCTCATCAGA ACAGTCAGAC TCATCAAGCT 
ATGCAACCTA TAATAGTAGC AATAGTAGCA 
15 GTGTGGTCCA TAGTAATCAT AGAATATAGG 

TTAATTGATA GACTAATAGA AAGAGCAGAA 
TCAGCACTTG TGGAGATGGG GGTGGAAATG 

20 

CTGTAGTGCT ACAGAAAAAT TGTGGGTCAC 
AGCAACCACC ACTCTATTTT GTGCATCAGA 
25 TGTTTGGGCC ACACATGCCT GTGTACCCAC 

AAATGTGACA GAAAATTTTA ACATGTGGAA 
TATAATCAGT TTATGGGATC AAAG CCT AAA 

30 

TAGTTTAAAG TGCACTGATT TGAAGAATGA 
GATAATGGAG AAAGGAGAGA TAAAAAACTG 

3 5 TAAGGTGCAG AAAGAATATG CATTCTTTTA 

CAGCTATAGG TTGATAAGTT GTAACACCTC 
CTTTGAGCCA ATTCCCATAC ATT ATTGTG C 

40 

TAATAAGACG TTCAATGGAA CAGGACCATG 
TGGAATCAGG CCAGTAGTAT CAACTCAACT 

4 5 TGTAGTAATT AGATCTGCCA ATTTCACAGA 

CACATCTGTA GAAATTAATT GTACAAGACC 
CCAGAGGGGA CCAGGGAGAG CATTTGTTAC 

50 

ACATTGTAAC ATTAGTAGAG CAAAATGGAA 
AAGAGAACAA TTTGGAAATA ATAAAACAAT 
55 AGAAATTGTA ACGCACAGTT TTAATTGTGG 

ACTGTTTAAT AGTACTTGGT TTAATAGTAC 
AGGAAGTGAC ACAATCACAC TCCCATGCAG 

60 

AGTAGGAAAA G CAATGT ATG CCCCTCCCAT 
TACTGGGCTG CTATTAACAA GAGATGGTGG 
6 5 ACCTGGAGGA GGCGATATGA GGGACAATTG 

AAAAATTGAA CCATTAGGAG TAGCACCCAC 
AAAAAGAGCA GTGGGAATAG GAGCTTTGTT 
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AAGCCATAAT 


AAGAATTCTG 


CAACAACTGC 


5760 


ATAGCAGAAT 


AGGCGTTACT 


CGACAGAGGA 


5820 


CTAGAGCCCT 


GGAAGCATCC 


AGGAAGTCAG 


5880 


AAAAAGTGTT 


GCTTTCATTG 


CCAAGTTTGT 


5940 


GGCAGGAAGA 


AG CG GAG AC A 


GCGACGAAGA 


6000 


TCTCTATCAA 


AGCAGTAAGT 


AGTACATGTA 


6060 


TTAGTAGTAG 


CAATAATAAT 


AGCAATAGTT 


6120 


AAAATATTAA 


GACAAAGAAA 


AATAGACAGG 


6180 


GACAGTGGCA 


ATGAGAGTGA 


AGGAGAAGTA 


6240 


GGGCACCATG 


CTCCTTGGGA 


TATTGATGAT 


6300 


AGTCTATTAT 


GGGGTACCTG 


TGTGGAAGGA 


6360 


TGCTAAAGCA 


TATGATACAG 


AGGTACATAA 


6420 


AGACCCCAAC 


CCACAAGAAG 


TAGTATTGGT 


6480 


AAATGACATG 


GTAGAACAGA 


TGCATGAGGA 


6540 


GCCATGTGTA 


AAATTAACCC 


CACTCTGTGT 


6600 


TACTAATACC 


AATAGTAGTA 


GCGGGAGAAT 


6660 


CTCTTTCAAT 


ATCAGCACAA 


GCATAAGAGA 


6720 


TAAACTTGAT 


ATAGTACCAA 


TAGATAATAC 


6780 


AGTCATTACA 


CAGGCCTGTC 


CAAAGGTATC 


6840 


CCCGGCTGGT 


TTTGCGATTC 


TAAAATGTAA 


6900 


TACAAATGTC 


AGCACAGTAC 


AATGTACACA 


6960 


GCTGTTAAAT 


GGCAGTCTAG 


CAGAAGAAGA 


7020 


C AATG CT AAA 


ACCATAATAG 


TACAGCTGAA 


7080 


CAACAACAAT 


ACAAGAAAAA 


GTATCCGTAT 


7140 


AATAGGAAAA 


ATAGGAAATA 


TGAGACAAGC 


7200 


TGCCACTTTA 


AAACAGATAG 


CTAGCAAATT 


7260 


AATCTTTAAG 


CAATCCTCAG 


GAGGGGACCC 


7320 


AGGGGAATTT 


TTCTACTGTA 


ATTCAACACA 


7380 


TTGGAGTACT 


GAAGGGTCAA 


ATAACACTGA 


7440 


AATAAAACAA 


TTTATAAACA 


TGTGGCAGGA 


7500 


CAGTGGACAA 


ATTAGATGTT 


CATCAAATAT 


7560 


TAATAACAAC 


AATGGGTCCG 


AGATCTTCAG 


7620 


GAGAAGTGAA 


TTATATAAAT 


ATAAAGTAGT 


7680 


CAAGGCAAAG 


AGAAGAGTGG 


TGCAGAGAGA 


7740 


CCTTGGGTTC 


TTGGGAGCAG 


CAGGAAGCAC 


7800 
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TATGGGCGCA GCGTCAATGA CGCTGACGGT 
GCAGCAGCAG AACAATTTGC TGAGGGCTAT 
5 AGTCTGGGGC ATCAAACAGC TCCAGGCAAG 

TCAACAGCTC CTGGGGATTT GGGGTTGCTC 
TTGGAATGCT AGTTGGAGTA ATAAATCTCT 

10 

GGAGTGGGAC AGAGAAATTA ACAATTACAC 
GCAAAACCAG CAAGAAAAGA ATGAACAAGA 
15 GTGGAATTGG TTTAACATAA CAAATTGGCT 

AGGAGGCTTG GTAGGTTTAA GAATAGTTTT 
GCAGGGATAT TCACCATTAT CGTTTCAGAC 

20 

GCCCGAAGGA ATAGAAGAAG AAGGTGGAGA 
GAACGGATCC TTAGCACTTA TCTGGGACGA 
25 CCGCTTGAGA GACTTACTCT TGATTGTAAC 

GTGGGAAGCC CTCAAATATT GGTGGAATCT 
TAGTGCTGTT AACTTG CTC A ATGCCACAGC 

30 

TATAGAAGTA TTACAAGCAG CTTATAGAGC 
GGGCTTGGAA AGGATTTTGC TATAAGATGG 
35 GATGGCCTGC TGTAAGGGAA AGAATGAGAC 

AAGAACTCTT CACTGGAGTT GTCCCAATTC 
ACAAGTTCTC TGTCAGTGGA GAGGGTGAAG 

40 

AGTTCATCTG CACTACTGGC AAACTGCCTG 
CTTATGGTGT TCAATGCTTT TCAAGATACC 
45 AGAGTGCCAT GCCCGAAGGT TATGTACAGG 

ACTACAAGAC ACGTG CTGAA GTCAAGTTTG 
TAAAAGGTAT TGACTTCAAG GAAGATGGCA 

50 

ATAACTCACA CAATGTATAC ATCATGGCAG 
TCAAGACCCG CCACAACATT GAAGATGGAA 
55 ATACTCCAAT TGGCGATGGC CCTGTCCTTT 

CTGCCCTTTC GAAAGATCCC AACGAAAAGA 
CAGCTGCTGG GATTACACAT GGCATGGATG 

60 

ACATGGAGCA ATCACAAGTA GCAATACAGC 
AGCACAAGAG GAGGAAGAGG TGGGTTTTCC 
65 GACTTACAAG GCAGCTGTAG ATCTTAGCCA 

G CTAATTC AC TCCCAAAGAA GACAAGATAT 
CTACTTCCCT GATTGGCAGA ACTACACACC 



93 

ACAGGCCAGA CAATTATTGT CTGATATAGT 7B60 
TGAGGCGCAA CAGCATCTGT TGCAACTCAC 7920 
AATCCTGGCT GTGGAAAGAT ACCTAAAGGA 7980 
TGGAAAACTC ATTTGCACCA CTGCTGTGCC 804 0 
GGAACAGATT TGGAATAACA TGACCTGGAT 8100 
AAGCTTAATA CACTCCTTAA TTGAAGAATC 8160 
ATTATTGGAA TTAGATAAAT GGGCAAGTTT 8220 
GTGGTATATA AAATTATTCA TAATGATAGT 8280 
TGCTGTACTT TCTATAGTGA ATAGAGTTAG 8340 
CCACCTCCCA ATCCCGAGGG GACCCGACAG 8400 
GAGAGACAGA GACAGATCCA TTCGATTAGT 8460 
TCTGCGGAGC CTGTGCCTCT TCAGCTACCA 8520 
GAGGATTGTG GAACTTCTGG GACGCAGGGG 8580 
CCTACAGTAT TGGAGTCAGG AACTAAAGAA 864 0 

CATAGCAGTA GCTGAGGGGA CAGATAGGGT 8700 

TATTCGCCAC ATACCTAGAA GAATAAGACA 8760 

GTGGCAAGTG GTCAAAAAGT AGTGTGATTG 8820 

GAGCTGAGCA AGAAATGGCT AGCAAAGGAG 8880 

TTGTTGAATT AGATGGTGAT GTTAACGGCC 8940 

GTGATGCAAC ATACGGAAAA CTTACCCTGA 9000 

TTCCATGGCC AACACTTGTC ACTACTCTCT 9060 

CGGATCATAT GAAACGGCAT GACTTTTTCA 9120 

AAAGGACCAT CTTCTTCAAA GATGACGGCA 9180 

AAGGTGATAC CCTTGTTAAT AGAATCGAGT 9240 

ACATTCTGGG ACACAAATTG GAATACAACT 9300 

ACAAACAAAA G AATG G AATC AAAGTGAACT 9360 

GCGTTCAACT AGCAGACCAT TATCAACAAA 9420 

TACCAGACAA CCATTACCTG TCCACACAAT 94 BO 

GAGACCACAT GGTCCTTCTT GAGTTTGTAA 954 0 

AACTGTACAA CGGACTCGAG ACCTAGAAAA 9600 

AGCTAACAAT GCTGCTTGTG CCTGGCTAGA 966 0 

AGTCACACCT CAGGTACCTT TAAGACCAAT 9720 

CTTTTTAAAA GAAAAGGGGG GACTGGAAGG 9780 

CCTTGATCTG TGGATCTACC ACACACAAGG 984 0 

AGGGCCAGGG GTCAGATATC CACTGACCTT 9900 
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TGGATGGTGC TACAAGCTAG TACCAGTTGA 

AGAGAACACC AGCTTGTTAC ACCCTGTGAG 

5 AGTGTTAGAG TGGAGGTTTG ACAGCCGCCT 

TCCGGAGTAC TTCAAGAACT GCTGACATCG 

CTTTCCAGGG AGGCGTGGCC TGGGCGGGAC 

10 

ATAAGCAGCT GCTTTTTGCC TGTACTGGGT 

AGCTCTCTGG CTAACTAGGG AACCCACTGC 

15 TTCAAGTAGT GTGTGCCCGT CTGTTGTGTG 

TTTAGTCAGT GTGGAAAATC TCTAGCACCC 

ATCGCGCCAC TGCATTCCAG CCTGGGCAAG 

20 

AGTTAAGGGT ATTAAATATA TTTATACATG 
GGCGCAGTGG CTCACACCTG CGCCCGGCCC 
25 AGTTTGGGAG TTCCAGACCA GCCTGACCAA 

AGTAGATTTT ATTTTATGTG TATTTTATTC 
TTCCTCTACT CTGATACCAC AAGAATCATC 

30 

TGGTGGGAGA GGGAGGTTTT CACCAGCACA 
GGTGTCCTTC GGTTCAGTTC CAACACCGCC 
35 GGGCTCAGTC CCCAAGACAT AAACACCCAA 

TGCTGCCCAG GCAGAGCCGA TTCACCAAGA 
CACAGAGCCG GCTGTGCGGG AGAACGGAGT 

40 

CATTCGGGGA TCAGAGTTTT TAAGGATAAC 

TGAAAGCGTA GGGAGTCGAA GGTGTCCTTT 

. 4 5 CAAGATCGGA TGAGCCAGTT TATCAATCCG 

TCTGCAAAAT ATCTCAAGCA CTGATTGATC 

GAACAATTTG GGGAAGGTCA GAATCTTGTA 

50 

TTTCTTTTTT GTTTTTTTTT TTTTATTTTT 

GGAGTGCAGT GGTGCAATCA C AG CTCACTG 

55 TCCCACCTCA GCCTGCCTGG TAGCTGAGAC 

TTTTGGTAGA GGCAGCGTTT TGCCGTGTGG 

GTGATCCAGC CTCAGCCTCC CAAAGTGCTG 

60 

CCTAAACCAT AATTTCTAAT CTTTTGGCTA 
CCCAGGCAAA AAGGGGGTTT GTTTCGGGAA 
6 5 AAACTAAGTT CCTCCTAAAC TTAGTTCGGC 

GAGGTTAGAA GCACGATGGA ATTGGTTAGG 
TTTGCAATGG TGGTTCAAAG ACTGCCCGCT 



PCT/US97/07625 



94 



GCCAGATAAG 


GTAGAAGAGG 


CCAATAAAGG 


9960 


CCTGCATGGA 


ATGGATGACC 


CTGAGAGAGA 


10020 


AGCATTTCAT 


CACGTGGCCC 


GAGAGCTGCA 


10080 


AGCTTGCTAC 


AAGGGACTTT 


CCGCTGGGGA 


10140 


TGGGGAGTGG 


CGAGCCCTCA 


GATGCTGCAT 


10200 


CTCTCTGGTT 


AGACCAGATC 


TGAGCCTGGG 


10260 


TTAAGCCTCA 


ATAAAGCTTG 


CCTTG AGTGC 


10320 


ACTCTGGTAA 


CTAGAGATCC 


CTCAGACCCT 


10380 


CCCAGGAGGT 


AGAGGTTGCA 


GTGAGCCAAG 


10440 


AAAACAAGAC 


TGTCTAAAAT 


AATAATAATA 


10500 


GAGGTCATAA 


AAATATATAT 


ATTTGGGCTG 


10560 


TTTGGGAGGC 


CGAGGCAGGT 


GGATCACCTG 


10620 


CATGGAGAAA 


CCCCTTCTCT 


GTGTATTTTT 


10680 


ACAGGTATTT 


CTGGAAAACT 


GAAACTGTTT 


10740 


AGCACAGAGG 


AAGACTTCTG 


TGATCAAATG 


10800 


TGAGCAGTCA 


GTTCTGCCGC 


AGACT CGGCG 


10860 


TGCCTGGAGA 


GAGGTCAGAC 


CACAGGGTGA 


10920 


GACATAAACA 


CCCAACAGGT 


CCACCCCGCC 


10980 


CGGGAATTAG 


GATAGAGAAA 


GAGTAAGTCA 


11040 


TCTATTATGA 


CTCAAATCAG 


TCTCCCCAAG 


11100 


TTAGTGTGTA 


GGGGGCCAGT 


GAG TTGGAG A 


11160 


TGCGCCGAGT 


CAGTTCCTGG 


GTGGGGGCCA 


11220 


GGGGTGCCAG 


CTGATCCATG 


GAGTGCAGGG 


11280 


TTAGGTTTTA 


CAATAGTGAT 


GTTACCCCAG 


11340 


GCCTGTAGCT 


GCATGACTCC 


TAAACCATAA 


11400 


GAGACAGGGT 


CTCACTCTGT 


CACCTAGGCT 


11460 


CAGCCTCAAC 


GTCGTAAGCT 


CAAGCGATCC 


11520 


TACAAGCGAC 


GCCCCAGTTA 


ATTTTTGTAT 


11580 


CCCTGGCTGG 


TCTCGAACTC 


CTGGGCTCAA 


11640 


GGACAACCGG 


GGCCAGTCAC 


TGCACCTGGC 


11700 


ATTTGTTAGT 


CCTACAAAGG 


CAGTCTAGTC 


11760 


AGGGCTGTTA 


CTGTCTTTGT 


TTCAAACTAT 


11820 


CTACACCCAG 


GAATGAACAA 


GG AG AG CTTG 


11880 


TCAGATCTCT 


TTCACTGTCT 


GAGTTATAAT 


11940 


TCTGACACCA 


GTCGCTGCAT 


TAATGAATCG 


12000 
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GCCAACGCGC GGGGAGAGGC GGTTTGCGTA 
CTCGCTGCGC TCGGTCGTTC GGCTGCGGCG 
5 ACGGTTATCC ACAGAATCAG GGGATAACGC 

AAAGGCCAGG AACCGTAAAA AGGCCGCGTT 
TGACGAGCAT CACAAAAATC GACGCTCAAG 

10 

AAGATACCAG GCGTTTCCCC CTGGAAGCTC 
GCTTACCGGA TACCTGTCCG CCTTTCTCCC 
15 ACGCTGTAGG TATCTCAGTT CGGTGTAGGT 

ACCCCCCGTT CAGCCCGACC GCTGCGCCTT 
GGTAAGACAC GACTTATCGC CACTGGCAGC 

20 

GTATGTAGGC GGTGCTACAG AGTTCTTGAA 
GACAGTATTT GGTATCTGCG CTCTGCTGAA 
25 CTCTTGATCC GGCAAACAAA CCACCGCTGG 

GATTACGCGC AGAAAAAAAG GATCTCAAGA 
CGCTCAGTGG AACGAAAACT CACGTTAAGG 

30 

CTTCACCTAG ATCCTTTTAA ATTAAAAATG 
GTAAACTTGG TCTGACAGTT ACCAATGCTT 
35 TCTATTTCGT TCATCCATAG TTGCCTGACT 

GGGCTTACCA TCTGGCCCCA GTGCTGCAAT 
AGATTTATCA GCAATAAACC AGCCAGCCGG 

40 

TTTATCCGCC TCCATCCAGT CTATTAATTG 
AGTTAATAGT TTGCGCAACG TTGTTGCCAT 
4 5 GTTTGGTATG GCTTCATTCA GCTCCGGTTC 

CATGTTGTGC AAAAAAGCGG TTAGCTCCTT 
GGCCGCAGTG TTATCACTCA TGGTTATGGC 

50 

ATCCGTAAGA TGCTTTTCTG TGACTGGTGA 
TATGCGGCGA CCGAGTTGCT CTTGCCCGGC 
55 CAGAACTTTA AAAGTGCTCA TC ATTG G AAA 

CTTACCGCTG TTGAGATCCA GTTCGATGTA 
ATCTTTTACT TTCACCAGCG TTTCTGGGTG 

60 

AAAGGGAATA AGGGCGACAC GGAAATGTTG 
TTGAAGCATT TATCAGGGTT ATTGTCTCAT 
6 5 AAATAAACAA ATAGGGGTTC CGCGCACATT 

AACCATTATT ATCATGACAT TAACCTATAA 
TCAAGAACTG CCTCGCGCGT TTCGGTGATG 



95 

TTGGCGCTCT TCCGCTTCCT CGCTCACTGA 12060 
AGCGGTATCA GCTCACTCAA AGGCGGTAAT 12120 
AGGAAAGAAC ATGTGAGCAA AAGGCCAGCA 12180 
GCTGGCGTTT TTCCATAGGC TCCGCCCCCC 1224 0 
TCAGAGGTGG CGAAACCCGA CAGGACTATA 123 00 
CCTCGTGCGC TCTCCTGTTC CGACCCTGCC 12360 
TTCGGGAAGC GTGGCGCTTT CTCAATGCTC 12420 

CGTTCGCTCC AAGCTGGGCT GTGTGCACGA 124 80 

ATCCGGTAAC TATCGTCTTG AGTCCAACCC 12540 

AGCCACTGGT AACAGGATTA GCAGAGCGAG 12600 

GTGGTGGCCT AACTACGGCT ACACTAGAAG 12660 

GCCAGTTACC TTCGGAAAAA GAGTTGGTAG 1272 0 

TAGCGGTGGT TTTTTTGTTT GCAAGCAGCA 12780 

AGATCCTTTG ATCTTTTCTA CGGGGTCTGA 1284 0 

GATTTTGGTC ATGAGATTAT CAAAAAGGAT 12900 

AAGTTTTAAA TCAATCTAAA GTATATATGA 12960 

AATCAGTGAG GCACCTATCT CAGCGATCTG 13020 

CCCCGTCGTG TAGATAACTA CGATACGGGA 13080 

GATACCGCGA GACCCACGCT CACCGGCTCC 1314 0 

AAGGGCCGAG CGCAGAAGTG GTCCTGCAAC 13200 

TTGCCGGGAA GCTAGAGTAA GTAGTTCGCC 13260 

TGCTACAGGC ATCGTGGTGT CACGCTCGTC 13320 

CCAACGATCA AGGCGAGTTA CATGATCCCC 133 80 

CGGTCCTCCG ATCGTTGTCA GAAGTAAGTT 13440 

AGCACTGCAT AATTCTCTTA CTGTCATGCC 13500 

GTACTCAACC AAGTCATTCT GAGAATAGTG 13 560 

GTCAATACGG GATAATACCG CGCCACATAG 1362 0 

ACGTTCTTCG GGGCGAAAAC TCTCAAGGAT 13680 

ACCCACTCGT GCACCCAACT GATCTTCAGC 13740 

AGCAAAAACA GGAAGGCAAA ATGCCGCAAA 13800 

AATAC TCATA CTCTTCCTTT TTCAATATTA 13 86 0 

GAGCGGATAC ATATTTGAAT GTATTTAGAA 13920 

TCCCCGAAAA GTGCCACCTG ACGTCTAAGA 13 980 

AAATAGGCGT ATCACGAGGC C CTTT CGTCT 14 04 0 

ACGGTGAAAA CCTCTGACAC ATGCAGCTCC 14100 
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CGGAGACGGT CACAGCTTGT CTGTAAGCGG 
CGTCAGCGGG TGTTGGCGGG TGTCGGGGCG 
5 GAGTGTACTG GCTTAACTAT GCGGCATCAG 

GGTGTGAAAT ACCGCACAGA TGCGTAAGGA 
TCAGGCTGCG CAACTGTTGG GAAGGGCGAT 

10 

GCGGGGAGGC AGAGATTGCA GTAAGCTGAG 
AGAGTAAGAC TCTGTCTCAA AAATAAAATA 
15 CTTTATTTAT TTATTTATTT TCTATTTTGG 

ACATATATTC TATTTTTCTT TATATGCTCC 
TGTATACAAA ATCTAGGCCA GTCCAGCAGA 

20 

ATAAATAAAA TCTAGCTCAC TCCTTCACAT 
TACCAAATAA CCCATCTTGT CCTCAATAAT 

2 5 CCTGTCAAAG GCATGTGCCC CTTCCGGGCG 

GGACTCTGCA GGGTCCCTAA CTGCCAAGCC 
TCTAGCGGCT GCCCCCACTC GGCTTTG CTT 

30 

AGGTCTGAAA CTAGGTGCGC ACAGAGCGGT 
AGGGGGTTTA TCACAGTGCA CCCTGACAGT 

3 5 CACCCTGACA GTCGTCAGCC TCACAGGGGG 

ATTTGATTCA CAATTTTTTT AGTCTCTACT 
AGGTGTGTTC CCAGAGGGGA AAACAGTATA 

40 

CTCCACCTGG GTCTTGGAAT GTGTCCCCCG 
ACAGGTCACA GTGACACAAG ATAACCAAGA 

4 5 CTCCACGTGC ACATGGCCGG AGGAACTGCC 

AGAGTCCTTG GTGTGGAGGG AGGGACCAGC 
AACCTAGGGA AAGCCCCAGT TCTACTTACA 

50 
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ATGCCGGGAG 


CAGACAAGCC 


CGTCAGGGCG 


14160 


CAGCCATGAC 


CCAGTCACGT 


AGCGATAGCG 


14220 


AGCAGATTGT 


ACTGAGAGTG 


CACCATATGC 


14280 


GAAAATACCG 


CATCAGGCGC 


CATTCGCCAT 


14340 


CGGTGCGGGC 


CTCTTCGCTA 


TTACGCCAGC 


14400 


ATCGCAGCAC 


TGCACTCCAG 


CCTGGGCGAC 


14460 


AATAAATCAA 


TCAGATATTC 


CAATCTTTTC 


14520 


AAACACAGTC 


CTTCCTTATT 


CCAGAATTAC 


14580 


AGTTTTTTTT 


AGACCTTCAC 


CTGAAATGTG 


14640 


GCCTAAAGGT 


AAAAAATAAA 


ATAATAAAAA 


14700 


CAAAATGGAG 


ATACAGCTGT 


TAGCATTAAA 


14760 


TTTAAGCGCC 


TCTCTCCACC 


ACATCTAACT 


14820 


CTCTGCTGTG 


CTGCCAACCA 


ACTGGCATGT 


14880 


CCACAGTGTG 


CCCTGAGGCT 


GCCCCTTCCT 


14940 


TCCCTAGTTT 


CAGTTACTTG 


CGTTCAGCCA 


15000 


AAGACTGCGA 


GAGAAAGAGA 


CCAGCTTTAC 


15060 


CGTCAGCCTC 


ACAGGGGGTT 


TATCACATTG 


15120 


TTTATCACAG 


TGCACCCTTA 


CAATCATTCC 


15180 


GTGCCTAACT 


TGTAAGTTAA 


ATTTGATCAG 


15240 


TACAGGGTTC 


AGTACTATCG 


CATTTCAGGC 


15300 


AGGGGTGATG 


ACTACCTCAG 


TTGGATCTCC 


15360 


CACCTCCCAA 


GGCTACCACA 


ATGGGCCGCC 


15420 


ATGTCGGAGG 


TGCAAGCACA 


CCTGCGCATC 


15480 


GCAGCTTCCA 


GCCATCCACC 


TGATGAACAG 


15540 


CCAGGAAAGG 


C 
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(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 74 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



<ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1 . . 74 

(D) OTHER INFORMATION: /note= "primer #17982- 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 
GGGGCGTACG GAGCGCTCCG AATTCGGTAC CGTTTAAACG GGCCCTCTCG AGTCCGTTGT 
ACAGTTCATC CATG 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY : - 

(B) LOCATION: 1..66 

(D) OTHER INFORMATION: /note= -primer #17983" 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
GGGGGAATTC GCGCGCGTAC GTAAGCGCTA GCTGAGCAAG AAATGG CT AG CAAAGGAGAA 
GAACTC 
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WHAT IS CLAIMED IS : 



1 1. An isolated nucleic acid that encodes an 

2 engineered Aequorea victoria fluorescent protein, wherein the 

3 protein encoded by the isolated nucleic acid is selected from 

4 the group that consists of: 

5 a. a protein that has leucine at amino acid position 

6 65, and wherein said protein has a cellular 

7 fluorescence that is at least five times greater 

8 than the cellular fluorescence of wild type Aeqruorea 

9 victoria green fluorescent protein; 

10 b. a protein that has leucine at amino acid position 65 

11 and threonine at position 168, and wherein said 

12 protein has a cellular fluorescence that is at least 

13 five times greater than wild type Aequorea victoria 

14 green fluorescent protein; 

15 c. a protein that has leucine at amino acid position 65 

16 threonine at position 168, and cysteine at position 

17 66, wherein said protein has a cellular fluorescence 

18 that is at least five times greater than the 

19 cellular fluorescence of wild type Aequorea victoria 

20 green fluorescent protein; 

21 d. A blue fluorescent protein that has histidine at 

22 amino acid position 67, leucine at position 65 and 

23 has a cellular fluorescence that is at least five 

24 times greater than that of BFP (Tyr 67 ^His) ; 

25 e. a blue fluorescent protein that has histidine at 

26 amino acid position 67, alanine at amino acid 

27 position 164 and has a cellular fluorescence that is 

28 at least five times greater than that of 

29 BFP(Tyr 67 -»His) ; 

30 f . a blue fluorescent protein that has histidine at 

31 amino acid position 67, leucine at amino acid 

32 position 65, alanine at amino acid position 164 and 

33 has a cellular fluorescence that is at least five 

34 times greater than that of BFP (Tyr 67 -»His) . 
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1 2. An isolated nucleic acid of claim 1, which 

2 encodes an engineered Aequorea victoria green fluorescent 

3 protein ("GFP") having a cellular fluorescence that is at 

4 least five times greater than that of wild type GFP, wherein 

5 the engineered GFP has a leucine at amino acid position 65. 

1 3 . An isolated nucleic acid according to claim 2, 

2 wherein the nucleic acid further encodes a threonine at amino 

3 acid position 168. 

1 4. An isolated nucleic acid according to claim 3, ( 

2 wherein the nucleic acid further encodes a cysteine at amino 

3 acid position 66. 

1 5. An isolated nucleic acid of claim 1 that 

2 encodes an engineered blue fluorescent protein ("BFP") that 

3 has histidine at amino acid position 67 and leucine at 

4 position 65, and has a cellular fluorescence that is at least 

5 five times greater than that of BFP (Tyr 67 -»His) . 

1 6. An isolated nucleic acid of claim 1 that 

2 encodes an engineered blue fluorescent protein ("BFP") that 

3 has histidine at amino acid position 67 and alanine at amino 

4 acid position 164, and has a cellular fluorescence that is at 

5 least five times greater than that of BFP (Tyr 67 -*His) . 

1 7. An isolated nucleic acid according to claim 6, 

2 wherein the nucleic acid further encodes leucine at amino acid 

3 position 65. 

1 8. A transformed cell that expresses a protein 

2 encoded by a nucleic acid of claim 1. 

1 9. A vector comprising a nucleic acid of claim 1. 

1 10. A transformed cell comprising a vector of 

2 claim 9 . 
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1 11. A transformed cell that expresses a protein 

2 encoded by the nucleic acid of claim 1 fused to a protein 

3 encoded by a second nucleic acid of interest. 

1 12. An isolated engineered Aeguorea victoria green 

2 fluorescent protein ( "GFP" ) wherein the engineered GFP 

3 comprises leucine at amino acid position 65, said engineered 

4 GFP having a cellular fluorescence that is at least five times 

5 greater than wild type GFP. 

1 13. An isolated engineered Aequorea victoria green 

2 fluorescent protein ("GFP") according to claim 12, wherein the 

3 engineered GFP has threonine at amino acid position 168, 

1 14. An isolated engineered Aequorea victoria green 

2 fluorescent protein ("GFP") according to claim 13, wherein the 

3 engineered GFP has cysteine at amino acid position 66. 

1 15. An isolated blue fluorescent protein { "BFP" ) 

2 that comprises histidine at amino acid position 67 and leucine 

3 at amino acid position 65 and has a cellular fluorescence that 

4 is at least five times greater than that of BFP (Tyr 67 -*His) . 

1 16. An isolated blue fluorescent protein ("BFP") 

2 that has a histidine at amino acid position 67 and an alanine 

3 at amino acid position 164, that has a cellular fluorescence 

4 that is at least five times greater than that of 

5 BFP (Tyr 67 -*His) . 

1 17. An isolated blue fluorescent protein ("BFP") 

2 according to claim 16, wherein the BFP further has leucine at 

3 amino acid position 65. 

1 18. A method of detecting and optionally isolating 

2 an engineered cell that contains a selected nucleic acid which 

3 encodes a selected protein or nucleic acid, comprising: 

4 a) stably introducing into a host cell in a population of 

5 host cells a vector that contains a first nucleic acid which 
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encodes a polypeptide selected from the group consisting of 
SG11, SG12, SG25, SB42, SB49, SB50 and a second nucleic acid 
which encodes a selected protein or nucleic acid, and 

b) detecting cells in the population of host cells that 
express SG11, SG12, SG25 f SB42, SB49, or SB50, and 

c) optionally sorting cells that express SG11, SG12, 
SG25, SB42, SB49, or SB50 with a fluorescence-activated cell 
sorter to isolate individual cells that express said 
fluorescent protein. 

19. A nucleic acid construct wherein a coding 
sequence selected from the group consisting of sequences that 
encode SG11, SG12, SG25, SB42, SB49, and SB50 is operably 
linked to a regulatory sequence of a selected gene. 

20. A nucleic acid construct wherein a first coding 
sequence that encodes a selected polypeptide is fused using 
genetic engineering to a second coding sequence selected from 
the group consisting of sequences that encode SG11, SG12, 
SG25, SB42, SB49, and SB50, such that expression of the fused 
sequence yields a fluorescent hybrid protein in which the 
polypeptide encoded by the first coding sequence is fused to 
the polypeptide encoded by the second coding sequence. 

21. A method of detecting and characterizing 
regulatory and coding sequence elements that regulate 
subcellular expression and targeting of proteins, comprising: 

a) expressing in an engineered cell, in the presence and 
absence of selected culture conditions and components, a 
nucleic acid wherein a first nucleic acid selected from the 
group consisting of nucleic acids that encode SG11, SG12, 
SG25, SB42, SB49, and SB50 is operably linked to a second 
nucleic acid derived from a selected gene; 

b) detecting the presence and subcellular localization of 
fluorescent signal. 
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